mirror of
https://github.com/borgbackup/borg.git
synced 2026-02-20 00:10:35 -05:00
add --json-lines option to diff command
This commit is contained in:
parent
eeda4650ae
commit
b2dea4422e
5 changed files with 221 additions and 46 deletions
|
|
@ -231,11 +231,16 @@ Standard output
|
|||
*stdout* is different and more command-dependent than logging. Commands like :ref:`borg_info`, :ref:`borg_create`
|
||||
and :ref:`borg_list` implement a ``--json`` option which turns their regular output into a single JSON object.
|
||||
|
||||
Some commands, like :ref:`borg_list` and :ref:`borg_diff`, can produce *a lot* of JSON. Since many JSON implementations
|
||||
don't support a streaming mode of operation, which is pretty much required to deal with this amount of JSON, these
|
||||
commands implement a ``--json-lines`` option which generates output in the `JSON lines <http://jsonlines.org/>`_ format,
|
||||
which is simply a number of JSON objects separated by new lines.
|
||||
|
||||
Dates are formatted according to ISO 8601 in local time. No explicit time zone is specified *at this time*
|
||||
(subject to change). The equivalent strftime format string is '%Y-%m-%dT%H:%M:%S.%f',
|
||||
e.g. ``2017-08-07T12:27:20.123456``.
|
||||
|
||||
The root object at least contains a *repository* key with an object containing:
|
||||
The root object of '--json' output will contain at least a *repository* key with an object containing:
|
||||
|
||||
id
|
||||
The ID of the repository, normally 64 hex characters
|
||||
|
|
@ -439,12 +444,7 @@ The same archive with more information (``borg info --last 1 --json``)::
|
|||
File listings
|
||||
+++++++++++++
|
||||
|
||||
Listing the contents of an archive can produce *a lot* of JSON. Since many JSON implementations
|
||||
don't support a streaming mode of operation, which is pretty much required to deal with this amount of
|
||||
JSON, output is generated in the `JSON lines <http://jsonlines.org/>`_ format, which is simply
|
||||
a number of JSON objects separated by new lines.
|
||||
|
||||
Each item (file, directory, ...) is described by one object in the :ref:`borg_list` output.
|
||||
Each archive item (file, directory, ...) is described by one object in the :ref:`borg_list` output.
|
||||
Refer to the *borg list* documentation for the available keys and their meaning.
|
||||
|
||||
Example (excerpt) of ``borg list --json-lines``::
|
||||
|
|
@ -452,6 +452,78 @@ Example (excerpt) of ``borg list --json-lines``::
|
|||
{"type": "d", "mode": "drwxr-xr-x", "user": "user", "group": "user", "uid": 1000, "gid": 1000, "path": "linux", "healthy": true, "source": "", "linktarget": "", "flags": null, "mtime": "2017-02-27T12:27:20.023407", "size": 0}
|
||||
{"type": "d", "mode": "drwxr-xr-x", "user": "user", "group": "user", "uid": 1000, "gid": 1000, "path": "linux/baz", "healthy": true, "source": "", "linktarget": "", "flags": null, "mtime": "2017-02-27T12:27:20.585407", "size": 0}
|
||||
|
||||
Archive Differencing
|
||||
++++++++++++++++++++
|
||||
|
||||
Each archive difference item (file contents, user/group/mode) output by :ref:`borg_diff` is represented by an *ItemDiff* object.
|
||||
The propertiese of an *ItemDiff* object are:
|
||||
|
||||
path:
|
||||
The filename/path of the *Item* (file, directory, symlink).
|
||||
|
||||
changes:
|
||||
A list of *Change* objects describing the changes made to the item in the two archives. For example,
|
||||
there will be two changes if the contents of a file are changed, and its ownership are changed.
|
||||
|
||||
The *Change* object can contain a number of properties depending on the type of change that occured.
|
||||
If a 'property' is not required for the type of change, it is not output.
|
||||
The possible properties of a *Change* object are:
|
||||
|
||||
type:
|
||||
The **type** property is always present. It identifies the type of change and will be one of these values:
|
||||
|
||||
- *modified* - file contents changed.
|
||||
- *added* - the file was added.
|
||||
- *removed* - the file was removed.
|
||||
- *added directory* - the directory was added.
|
||||
- *removed directory* - the directory was removed.
|
||||
- *added link* - the symlink was added.
|
||||
- *removed link* - the symlink was removed.
|
||||
- *changed link* - the symlink target was changed.
|
||||
- *mode* - the file/directory/link mode was changed. Note - this could indicate a change from a
|
||||
file/directory/link type to a different type (file/directory/link), such as -- a file is deleted and replaced
|
||||
with a directory of the same name.
|
||||
- *owner* - user and/or group ownership changed.
|
||||
|
||||
size:
|
||||
If **type** == '*added*' or '*removed*', then **size** provides the size of the added or removed file.
|
||||
|
||||
added:
|
||||
If **type** == '*modified*' and chunk ids can be compared, then **added** and **removed** indicate the amount
|
||||
of data 'added' and 'removed'. If chunk ids can not be compared, then **added** and **removed** properties are
|
||||
not provided and the only information available is that the file contents were modified.
|
||||
|
||||
removed:
|
||||
See **added** property.
|
||||
|
||||
old_mode:
|
||||
If **type** == '*mode*', then **old_mode** and **new_mode** provide the mode and permissions changes.
|
||||
|
||||
new_mode:
|
||||
See **old_mode** property.
|
||||
|
||||
old_user:
|
||||
If **type** == '*owner*', then **old_user**, **new_user**, **old_group** and **new_group** provide the user
|
||||
and group ownership changes.
|
||||
|
||||
old_group:
|
||||
See **old_user** property.
|
||||
|
||||
new_user:
|
||||
See **old_user** property.
|
||||
|
||||
new_group:
|
||||
See **old_user** property.
|
||||
|
||||
|
||||
Example (excerpt) of ``borg diff --json-lines``::
|
||||
|
||||
{"path": "file1", "changes": [{"path": "file1", "changes": [{"type": "modified", "added": 17, "removed": 5}, {"type": "mode", "old_mode": "-rw-r--r--", "new_mode": "-rwxr-xr-x"}]}]}
|
||||
{"path": "file2", "changes": [{"type": "modified", "added": 135, "removed": 252}]}
|
||||
{"path": "file4", "changes": [{"type": "added", "size": 0}]}
|
||||
{"path": "file3", "changes": [{"type": "removed", "size": 0}]}
|
||||
|
||||
|
||||
.. _msgid:
|
||||
|
||||
Message IDs
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ Examples
|
|||
$ echo "something" >> file2
|
||||
$ borg create ../testrepo::archive2 .
|
||||
|
||||
$ echo "testing 123" >> file1
|
||||
$ rm file3
|
||||
$ touch file4
|
||||
$ borg create ../testrepo::archive3 .
|
||||
|
|
@ -26,11 +27,18 @@ Examples
|
|||
+135 B -252 B file2
|
||||
|
||||
$ borg diff testrepo::archive2 archive3
|
||||
+17 B -5 B file1
|
||||
added 0 B file4
|
||||
removed 0 B file3
|
||||
|
||||
$ borg diff testrepo::archive1 archive3
|
||||
[-rw-r--r-- -> -rwxr-xr-x] file1
|
||||
+17 B -5 B [-rw-r--r-- -> -rwxr-xr-x] file1
|
||||
+135 B -252 B file2
|
||||
added 0 B file4
|
||||
removed 0 B file3
|
||||
|
||||
$ borg diff --json-lines testrepo::archive1 archive3
|
||||
{"path": "file1", "changes": [{"type": "modified", "added": 17, "removed": 5}, {"type": "mode", "old_mode": "-rw-r--r--", "new_mode": "-rwxr-xr-x"}]}
|
||||
{"path": "file2", "changes": [{"type": "modified", "added": 135, "removed": 252}]}
|
||||
{"path": "file4", "changes": [{"type": "added", "size": 0}]}
|
||||
{"path": "file3", "changes": [{"type": "removed", "size": 0}]}
|
||||
|
|
@ -1149,8 +1149,13 @@ class Archiver:
|
|||
def do_diff(self, args, repository, manifest, key, archive):
|
||||
"""Diff contents of two archives"""
|
||||
|
||||
def print_output(diff, path):
|
||||
print("{:<19} {}".format(diff, path))
|
||||
def print_json_output(diff, path):
|
||||
print(json.dumps({"path": path, "changes": [j for j, str in diff]}))
|
||||
|
||||
def print_text_output(diff, path):
|
||||
print("{:<19} {}".format(' '.join([str for j, str in diff]), path))
|
||||
|
||||
print_output = print_json_output if args.json_lines else print_text_output
|
||||
|
||||
archive1 = archive
|
||||
archive2 = Archive(repository, key, manifest, args.archive2,
|
||||
|
|
@ -1167,7 +1172,7 @@ class Archiver:
|
|||
|
||||
diffs = Archive.compare_archives_iter(archive1, archive2, matcher, can_compare_chunk_ids=can_compare_chunk_ids)
|
||||
# Conversion to string and filtering for diff.equal to save memory if sorting
|
||||
diffs = ((path, str(diff)) for path, diff in diffs if not diff.equal)
|
||||
diffs = ((path, diff.changes()) for path, diff in diffs if not diff.equal)
|
||||
|
||||
if args.sort:
|
||||
diffs = sorted(diffs)
|
||||
|
|
@ -3709,6 +3714,8 @@ class Archiver:
|
|||
help='Override check of chunker parameters.')
|
||||
subparser.add_argument('--sort', dest='sort', action='store_true',
|
||||
help='Sort the output lines by file path.')
|
||||
subparser.add_argument('--json-lines', action='store_true',
|
||||
help='Format output as JSON Lines. ')
|
||||
subparser.add_argument('location', metavar='REPO::ARCHIVE1',
|
||||
type=location_validator(archive=True),
|
||||
help='repository location and ARCHIVE1 name')
|
||||
|
|
|
|||
|
|
@ -418,27 +418,31 @@ class ItemDiff:
|
|||
self._numeric_owner = numeric_owner
|
||||
self._can_compare_chunk_ids = can_compare_chunk_ids
|
||||
self.equal = self._equal(chunk_iterator1, chunk_iterator2)
|
||||
changes = []
|
||||
|
||||
if self._item1.is_link() or self._item2.is_link():
|
||||
changes.append(self._link_diff())
|
||||
|
||||
if 'chunks' in self._item1 and 'chunks' in self._item2:
|
||||
changes.append(self._content_diff())
|
||||
|
||||
if self._item1.is_dir() or self._item2.is_dir():
|
||||
changes.append(self._dir_diff())
|
||||
|
||||
if not (self._item1.get('deleted') or self._item2.get('deleted')):
|
||||
changes.append(self._owner_diff())
|
||||
changes.append(self._mode_diff())
|
||||
|
||||
# filter out empty changes
|
||||
self._changes = [ch for ch in changes if ch]
|
||||
|
||||
def changes(self):
|
||||
return self._changes
|
||||
|
||||
def __repr__(self):
|
||||
if self.equal:
|
||||
return 'equal'
|
||||
|
||||
changes = []
|
||||
|
||||
if self._item1.is_link() or self._item2.is_link():
|
||||
changes.append(self._link_string())
|
||||
|
||||
if 'chunks' in self._item1 and 'chunks' in self._item2:
|
||||
changes.append(self._content_string())
|
||||
|
||||
if self._item1.is_dir() or self._item2.is_dir():
|
||||
changes.append(self._dir_string())
|
||||
|
||||
if not (self._item1.get('deleted') or self._item2.get('deleted')):
|
||||
changes.append(self._owner_string())
|
||||
changes.append(self._mode_string())
|
||||
|
||||
return ' '.join((x for x in changes if x))
|
||||
return ' '.join(str for d,str in self._changes)
|
||||
|
||||
def _equal(self, chunk_iterator1, chunk_iterator2):
|
||||
# if both are deleted, there is nothing at path regardless of what was deleted
|
||||
|
|
@ -461,46 +465,52 @@ class ItemDiff:
|
|||
|
||||
return True
|
||||
|
||||
def _link_string(self):
|
||||
def _link_diff(self):
|
||||
if self._item1.get('deleted'):
|
||||
return 'added link'
|
||||
return ({"type": 'added link'}, 'added link')
|
||||
if self._item2.get('deleted'):
|
||||
return 'removed link'
|
||||
return ({"type": 'removed link'}, 'removed link')
|
||||
if 'source' in self._item1 and 'source' in self._item2 and self._item1.source != self._item2.source:
|
||||
return 'changed link'
|
||||
return ({"type": 'changed link'}, 'changed link')
|
||||
|
||||
def _content_string(self):
|
||||
def _content_diff(self):
|
||||
if self._item1.get('deleted'):
|
||||
return ('added {:>13}'.format(format_file_size(self._item2.get_size())))
|
||||
sz = self._item2.get_size()
|
||||
return ({"type": "added", "size": sz}, 'added {:>13}'.format(format_file_size(sz)))
|
||||
if self._item2.get('deleted'):
|
||||
return ('removed {:>11}'.format(format_file_size(self._item1.get_size())))
|
||||
sz = self._item1.get_size()
|
||||
return ({"type": "removed", "size": sz}, 'removed {:>11}'.format(format_file_size(sz)))
|
||||
if not self._can_compare_chunk_ids:
|
||||
return 'modified'
|
||||
return ({"type": "modified"}, "modified")
|
||||
chunk_ids1 = {c.id for c in self._item1.chunks}
|
||||
chunk_ids2 = {c.id for c in self._item2.chunks}
|
||||
added_ids = chunk_ids2 - chunk_ids1
|
||||
removed_ids = chunk_ids1 - chunk_ids2
|
||||
added = self._item2.get_size(consider_ids=added_ids)
|
||||
removed = self._item1.get_size(consider_ids=removed_ids)
|
||||
return ('{:>9} {:>9}'.format(format_file_size(added, precision=1, sign=True),
|
||||
format_file_size(-removed, precision=1, sign=True)))
|
||||
|
||||
def _dir_string(self):
|
||||
return ({"type": "modified", "added": added, "removed": removed},
|
||||
'{:>9} {:>9}'.format(format_file_size(added, precision=1, sign=True),
|
||||
format_file_size(-removed, precision=1, sign=True)))
|
||||
|
||||
def _dir_diff(self):
|
||||
if self._item2.get('deleted') and not self._item1.get('deleted'):
|
||||
return 'removed directory'
|
||||
return ({"type": 'removed directory'}, 'removed directory')
|
||||
if self._item1.get('deleted') and not self._item2.get('deleted'):
|
||||
return 'added directory'
|
||||
return ({"type": 'added directory'}, 'added directory')
|
||||
|
||||
def _owner_string(self):
|
||||
def _owner_diff(self):
|
||||
u_attr, g_attr = ('uid', 'gid') if self._numeric_owner else ('user', 'group')
|
||||
u1, g1 = self._item1.get(u_attr), self._item1.get(g_attr)
|
||||
u2, g2 = self._item2.get(u_attr), self._item2.get(g_attr)
|
||||
if (u1, g1) != (u2, g2):
|
||||
return '[{}:{} -> {}:{}]'.format(u1, g1, u2, g2)
|
||||
return ({"type": "owner", "old_user": u1, "old_group": g1, "new_user": u2, "new_group": g2},
|
||||
'[{}:{} -> {}:{}]'.format(u1, g1, u2, g2))
|
||||
|
||||
def _mode_string(self):
|
||||
def _mode_diff(self):
|
||||
if 'mode' in self._item1 and 'mode' in self._item2 and self._item1.mode != self._item2.mode:
|
||||
return '[{} -> {}]'.format(stat.filemode(self._item1.mode), stat.filemode(self._item2.mode))
|
||||
mode1 = stat.filemode(self._item1.mode)
|
||||
mode2 = stat.filemode(self._item2.mode)
|
||||
return ({"type": "mode", "old_mode": mode1, "new_mode": mode2}, '[{} -> {}]'.format(mode1, mode2))
|
||||
|
||||
def _content_equal(self, chunk_iterator1, chunk_iterator2):
|
||||
if self._can_compare_chunk_ids:
|
||||
|
|
|
|||
|
|
@ -4060,9 +4060,87 @@ class DiffArchiverTestCase(ArchiverTestCaseBase):
|
|||
if are_hardlinks_supported():
|
||||
assert 'input/hardlink_target_replaced' not in output
|
||||
|
||||
def do_json_asserts(output, can_compare_ids):
|
||||
def get_changes(filename, data):
|
||||
chgsets = [j['changes'] for j in data if j['path'] == filename]
|
||||
assert len(chgsets) < 2
|
||||
# return a flattened list of changes for given filename
|
||||
return [chg for chgset in chgsets for chg in chgset]
|
||||
|
||||
# convert output to list of dicts
|
||||
joutput = [json.loads(line) for line in output.split('\n') if line]
|
||||
|
||||
# File contents changed (deleted and replaced with a new file)
|
||||
expected = {'type': 'modified', 'added': 4096, 'removed': 1024} if can_compare_ids else {'type': 'modified'}
|
||||
assert expected in get_changes('input/file_replaced', joutput)
|
||||
|
||||
# File unchanged
|
||||
assert not any(get_changes('input/file_unchanged', joutput))
|
||||
|
||||
# Directory replaced with a regular file
|
||||
if 'BORG_TESTS_IGNORE_MODES' not in os.environ:
|
||||
assert {'type': 'mode', 'old_mode': 'drwxr-xr-x', 'new_mode': '-rwxr-xr-x'} in \
|
||||
get_changes('input/dir_replaced_with_file', joutput)
|
||||
|
||||
# Basic directory cases
|
||||
assert {'type': 'added directory'} in get_changes('input/dir_added', joutput)
|
||||
assert {'type': 'removed directory'} in get_changes('input/dir_removed', joutput)
|
||||
|
||||
if are_symlinks_supported():
|
||||
# Basic symlink cases
|
||||
assert {'type': 'changed link'} in get_changes('input/link_changed', joutput)
|
||||
assert {'type': 'added link'} in get_changes('input/link_added', joutput)
|
||||
assert {'type': 'removed link'} in get_changes('input/link_removed', joutput)
|
||||
|
||||
# Symlink replacing or being replaced
|
||||
assert any(chg['type'] == 'mode' and chg['new_mode'].startswith('l') for chg in
|
||||
get_changes('input/dir_replaced_with_link', joutput))
|
||||
assert any(chg['type'] == 'mode' and chg['old_mode'].startswith('l') for chg in
|
||||
get_changes('input/link_replaced_by_file', joutput))
|
||||
|
||||
# Symlink target removed. Should not affect the symlink at all.
|
||||
assert not any(get_changes('input/link_target_removed', joutput))
|
||||
|
||||
# The inode has two links and the file contents changed. Borg
|
||||
# should notice the changes in both links. However, the symlink
|
||||
# pointing to the file is not changed.
|
||||
expected = {'type': 'modified', 'added': 13, 'removed': 0} if can_compare_ids else {'type': 'modified'}
|
||||
assert expected in get_changes('input/empty', joutput)
|
||||
if are_hardlinks_supported():
|
||||
assert expected in get_changes('input/hardlink_contents_changed', joutput)
|
||||
if are_symlinks_supported():
|
||||
assert not any(get_changes('input/link_target_contents_changed', joutput))
|
||||
|
||||
# Added a new file and a hard link to it. Both links to the same
|
||||
# inode should appear as separate files.
|
||||
assert {'type': 'added', 'size': 2048} in get_changes('input/file_added', joutput)
|
||||
if are_hardlinks_supported():
|
||||
assert {'type': 'added', 'size': 2048} in get_changes('input/hardlink_added', joutput)
|
||||
|
||||
# check if a diff between non-existent and empty new file is found
|
||||
assert {'type': 'added', 'size': 0} in get_changes('input/file_empty_added', joutput)
|
||||
|
||||
# The inode has two links and both of them are deleted. They should
|
||||
# appear as two deleted files.
|
||||
assert {'type': 'removed', 'size': 256} in get_changes('input/file_removed', joutput)
|
||||
if are_hardlinks_supported():
|
||||
assert {'type': 'removed', 'size': 256} in get_changes('input/hardlink_removed', joutput)
|
||||
|
||||
# Another link (marked previously as the source in borg) to the
|
||||
# same inode was removed. This should not change this link at all.
|
||||
if are_hardlinks_supported():
|
||||
assert not any(get_changes('input/hardlink_target_removed', joutput))
|
||||
|
||||
# Another link (marked previously as the source in borg) to the
|
||||
# same inode was replaced with a new regular file. This should not
|
||||
# change this link at all.
|
||||
if are_hardlinks_supported():
|
||||
assert not any(get_changes('input/hardlink_target_replaced', joutput))
|
||||
|
||||
do_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1a'), True)
|
||||
# We expect exit_code=1 due to the chunker params warning
|
||||
do_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1b', exit_code=1), False)
|
||||
do_json_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1a', '--json-lines'), True)
|
||||
|
||||
def test_sort_option(self):
|
||||
self.cmd('init', '--encryption=repokey', self.repository_location)
|
||||
|
|
|
|||
Loading…
Reference in a new issue