add --json-lines option to diff command

This commit is contained in:
Robert Blenis 2021-03-09 18:08:52 -05:00
parent eeda4650ae
commit b2dea4422e
5 changed files with 221 additions and 46 deletions

View file

@ -231,11 +231,16 @@ Standard output
*stdout* is different and more command-dependent than logging. Commands like :ref:`borg_info`, :ref:`borg_create`
and :ref:`borg_list` implement a ``--json`` option which turns their regular output into a single JSON object.
Some commands, like :ref:`borg_list` and :ref:`borg_diff`, can produce *a lot* of JSON. Since many JSON implementations
don't support a streaming mode of operation, which is pretty much required to deal with this amount of JSON, these
commands implement a ``--json-lines`` option which generates output in the `JSON lines <http://jsonlines.org/>`_ format,
which is simply a number of JSON objects separated by new lines.
Dates are formatted according to ISO 8601 in local time. No explicit time zone is specified *at this time*
(subject to change). The equivalent strftime format string is '%Y-%m-%dT%H:%M:%S.%f',
e.g. ``2017-08-07T12:27:20.123456``.
The root object at least contains a *repository* key with an object containing:
The root object of '--json' output will contain at least a *repository* key with an object containing:
id
The ID of the repository, normally 64 hex characters
@ -439,12 +444,7 @@ The same archive with more information (``borg info --last 1 --json``)::
File listings
+++++++++++++
Listing the contents of an archive can produce *a lot* of JSON. Since many JSON implementations
don't support a streaming mode of operation, which is pretty much required to deal with this amount of
JSON, output is generated in the `JSON lines <http://jsonlines.org/>`_ format, which is simply
a number of JSON objects separated by new lines.
Each item (file, directory, ...) is described by one object in the :ref:`borg_list` output.
Each archive item (file, directory, ...) is described by one object in the :ref:`borg_list` output.
Refer to the *borg list* documentation for the available keys and their meaning.
Example (excerpt) of ``borg list --json-lines``::
@ -452,6 +452,78 @@ Example (excerpt) of ``borg list --json-lines``::
{"type": "d", "mode": "drwxr-xr-x", "user": "user", "group": "user", "uid": 1000, "gid": 1000, "path": "linux", "healthy": true, "source": "", "linktarget": "", "flags": null, "mtime": "2017-02-27T12:27:20.023407", "size": 0}
{"type": "d", "mode": "drwxr-xr-x", "user": "user", "group": "user", "uid": 1000, "gid": 1000, "path": "linux/baz", "healthy": true, "source": "", "linktarget": "", "flags": null, "mtime": "2017-02-27T12:27:20.585407", "size": 0}
Archive Differencing
++++++++++++++++++++
Each archive difference item (file contents, user/group/mode) output by :ref:`borg_diff` is represented by an *ItemDiff* object.
The propertiese of an *ItemDiff* object are:
path:
The filename/path of the *Item* (file, directory, symlink).
changes:
A list of *Change* objects describing the changes made to the item in the two archives. For example,
there will be two changes if the contents of a file are changed, and its ownership are changed.
The *Change* object can contain a number of properties depending on the type of change that occured.
If a 'property' is not required for the type of change, it is not output.
The possible properties of a *Change* object are:
type:
The **type** property is always present. It identifies the type of change and will be one of these values:
- *modified* - file contents changed.
- *added* - the file was added.
- *removed* - the file was removed.
- *added directory* - the directory was added.
- *removed directory* - the directory was removed.
- *added link* - the symlink was added.
- *removed link* - the symlink was removed.
- *changed link* - the symlink target was changed.
- *mode* - the file/directory/link mode was changed. Note - this could indicate a change from a
file/directory/link type to a different type (file/directory/link), such as -- a file is deleted and replaced
with a directory of the same name.
- *owner* - user and/or group ownership changed.
size:
If **type** == '*added*' or '*removed*', then **size** provides the size of the added or removed file.
added:
If **type** == '*modified*' and chunk ids can be compared, then **added** and **removed** indicate the amount
of data 'added' and 'removed'. If chunk ids can not be compared, then **added** and **removed** properties are
not provided and the only information available is that the file contents were modified.
removed:
See **added** property.
old_mode:
If **type** == '*mode*', then **old_mode** and **new_mode** provide the mode and permissions changes.
new_mode:
See **old_mode** property.
old_user:
If **type** == '*owner*', then **old_user**, **new_user**, **old_group** and **new_group** provide the user
and group ownership changes.
old_group:
See **old_user** property.
new_user:
See **old_user** property.
new_group:
See **old_user** property.
Example (excerpt) of ``borg diff --json-lines``::
{"path": "file1", "changes": [{"path": "file1", "changes": [{"type": "modified", "added": 17, "removed": 5}, {"type": "mode", "old_mode": "-rw-r--r--", "new_mode": "-rwxr-xr-x"}]}]}
{"path": "file2", "changes": [{"type": "modified", "added": 135, "removed": 252}]}
{"path": "file4", "changes": [{"type": "added", "size": 0}]}
{"path": "file3", "changes": [{"type": "removed", "size": 0}]}
.. _msgid:
Message IDs

View file

@ -16,6 +16,7 @@ Examples
$ echo "something" >> file2
$ borg create ../testrepo::archive2 .
$ echo "testing 123" >> file1
$ rm file3
$ touch file4
$ borg create ../testrepo::archive3 .
@ -26,11 +27,18 @@ Examples
+135 B -252 B file2
$ borg diff testrepo::archive2 archive3
+17 B -5 B file1
added 0 B file4
removed 0 B file3
$ borg diff testrepo::archive1 archive3
[-rw-r--r-- -> -rwxr-xr-x] file1
+17 B -5 B [-rw-r--r-- -> -rwxr-xr-x] file1
+135 B -252 B file2
added 0 B file4
removed 0 B file3
$ borg diff --json-lines testrepo::archive1 archive3
{"path": "file1", "changes": [{"type": "modified", "added": 17, "removed": 5}, {"type": "mode", "old_mode": "-rw-r--r--", "new_mode": "-rwxr-xr-x"}]}
{"path": "file2", "changes": [{"type": "modified", "added": 135, "removed": 252}]}
{"path": "file4", "changes": [{"type": "added", "size": 0}]}
{"path": "file3", "changes": [{"type": "removed", "size": 0}]}

View file

@ -1149,8 +1149,13 @@ class Archiver:
def do_diff(self, args, repository, manifest, key, archive):
"""Diff contents of two archives"""
def print_output(diff, path):
print("{:<19} {}".format(diff, path))
def print_json_output(diff, path):
print(json.dumps({"path": path, "changes": [j for j, str in diff]}))
def print_text_output(diff, path):
print("{:<19} {}".format(' '.join([str for j, str in diff]), path))
print_output = print_json_output if args.json_lines else print_text_output
archive1 = archive
archive2 = Archive(repository, key, manifest, args.archive2,
@ -1167,7 +1172,7 @@ class Archiver:
diffs = Archive.compare_archives_iter(archive1, archive2, matcher, can_compare_chunk_ids=can_compare_chunk_ids)
# Conversion to string and filtering for diff.equal to save memory if sorting
diffs = ((path, str(diff)) for path, diff in diffs if not diff.equal)
diffs = ((path, diff.changes()) for path, diff in diffs if not diff.equal)
if args.sort:
diffs = sorted(diffs)
@ -3709,6 +3714,8 @@ class Archiver:
help='Override check of chunker parameters.')
subparser.add_argument('--sort', dest='sort', action='store_true',
help='Sort the output lines by file path.')
subparser.add_argument('--json-lines', action='store_true',
help='Format output as JSON Lines. ')
subparser.add_argument('location', metavar='REPO::ARCHIVE1',
type=location_validator(archive=True),
help='repository location and ARCHIVE1 name')

View file

@ -418,27 +418,31 @@ class ItemDiff:
self._numeric_owner = numeric_owner
self._can_compare_chunk_ids = can_compare_chunk_ids
self.equal = self._equal(chunk_iterator1, chunk_iterator2)
changes = []
if self._item1.is_link() or self._item2.is_link():
changes.append(self._link_diff())
if 'chunks' in self._item1 and 'chunks' in self._item2:
changes.append(self._content_diff())
if self._item1.is_dir() or self._item2.is_dir():
changes.append(self._dir_diff())
if not (self._item1.get('deleted') or self._item2.get('deleted')):
changes.append(self._owner_diff())
changes.append(self._mode_diff())
# filter out empty changes
self._changes = [ch for ch in changes if ch]
def changes(self):
return self._changes
def __repr__(self):
if self.equal:
return 'equal'
changes = []
if self._item1.is_link() or self._item2.is_link():
changes.append(self._link_string())
if 'chunks' in self._item1 and 'chunks' in self._item2:
changes.append(self._content_string())
if self._item1.is_dir() or self._item2.is_dir():
changes.append(self._dir_string())
if not (self._item1.get('deleted') or self._item2.get('deleted')):
changes.append(self._owner_string())
changes.append(self._mode_string())
return ' '.join((x for x in changes if x))
return ' '.join(str for d,str in self._changes)
def _equal(self, chunk_iterator1, chunk_iterator2):
# if both are deleted, there is nothing at path regardless of what was deleted
@ -461,46 +465,52 @@ class ItemDiff:
return True
def _link_string(self):
def _link_diff(self):
if self._item1.get('deleted'):
return 'added link'
return ({"type": 'added link'}, 'added link')
if self._item2.get('deleted'):
return 'removed link'
return ({"type": 'removed link'}, 'removed link')
if 'source' in self._item1 and 'source' in self._item2 and self._item1.source != self._item2.source:
return 'changed link'
return ({"type": 'changed link'}, 'changed link')
def _content_string(self):
def _content_diff(self):
if self._item1.get('deleted'):
return ('added {:>13}'.format(format_file_size(self._item2.get_size())))
sz = self._item2.get_size()
return ({"type": "added", "size": sz}, 'added {:>13}'.format(format_file_size(sz)))
if self._item2.get('deleted'):
return ('removed {:>11}'.format(format_file_size(self._item1.get_size())))
sz = self._item1.get_size()
return ({"type": "removed", "size": sz}, 'removed {:>11}'.format(format_file_size(sz)))
if not self._can_compare_chunk_ids:
return 'modified'
return ({"type": "modified"}, "modified")
chunk_ids1 = {c.id for c in self._item1.chunks}
chunk_ids2 = {c.id for c in self._item2.chunks}
added_ids = chunk_ids2 - chunk_ids1
removed_ids = chunk_ids1 - chunk_ids2
added = self._item2.get_size(consider_ids=added_ids)
removed = self._item1.get_size(consider_ids=removed_ids)
return ('{:>9} {:>9}'.format(format_file_size(added, precision=1, sign=True),
format_file_size(-removed, precision=1, sign=True)))
def _dir_string(self):
return ({"type": "modified", "added": added, "removed": removed},
'{:>9} {:>9}'.format(format_file_size(added, precision=1, sign=True),
format_file_size(-removed, precision=1, sign=True)))
def _dir_diff(self):
if self._item2.get('deleted') and not self._item1.get('deleted'):
return 'removed directory'
return ({"type": 'removed directory'}, 'removed directory')
if self._item1.get('deleted') and not self._item2.get('deleted'):
return 'added directory'
return ({"type": 'added directory'}, 'added directory')
def _owner_string(self):
def _owner_diff(self):
u_attr, g_attr = ('uid', 'gid') if self._numeric_owner else ('user', 'group')
u1, g1 = self._item1.get(u_attr), self._item1.get(g_attr)
u2, g2 = self._item2.get(u_attr), self._item2.get(g_attr)
if (u1, g1) != (u2, g2):
return '[{}:{} -> {}:{}]'.format(u1, g1, u2, g2)
return ({"type": "owner", "old_user": u1, "old_group": g1, "new_user": u2, "new_group": g2},
'[{}:{} -> {}:{}]'.format(u1, g1, u2, g2))
def _mode_string(self):
def _mode_diff(self):
if 'mode' in self._item1 and 'mode' in self._item2 and self._item1.mode != self._item2.mode:
return '[{} -> {}]'.format(stat.filemode(self._item1.mode), stat.filemode(self._item2.mode))
mode1 = stat.filemode(self._item1.mode)
mode2 = stat.filemode(self._item2.mode)
return ({"type": "mode", "old_mode": mode1, "new_mode": mode2}, '[{} -> {}]'.format(mode1, mode2))
def _content_equal(self, chunk_iterator1, chunk_iterator2):
if self._can_compare_chunk_ids:

View file

@ -4060,9 +4060,87 @@ class DiffArchiverTestCase(ArchiverTestCaseBase):
if are_hardlinks_supported():
assert 'input/hardlink_target_replaced' not in output
def do_json_asserts(output, can_compare_ids):
def get_changes(filename, data):
chgsets = [j['changes'] for j in data if j['path'] == filename]
assert len(chgsets) < 2
# return a flattened list of changes for given filename
return [chg for chgset in chgsets for chg in chgset]
# convert output to list of dicts
joutput = [json.loads(line) for line in output.split('\n') if line]
# File contents changed (deleted and replaced with a new file)
expected = {'type': 'modified', 'added': 4096, 'removed': 1024} if can_compare_ids else {'type': 'modified'}
assert expected in get_changes('input/file_replaced', joutput)
# File unchanged
assert not any(get_changes('input/file_unchanged', joutput))
# Directory replaced with a regular file
if 'BORG_TESTS_IGNORE_MODES' not in os.environ:
assert {'type': 'mode', 'old_mode': 'drwxr-xr-x', 'new_mode': '-rwxr-xr-x'} in \
get_changes('input/dir_replaced_with_file', joutput)
# Basic directory cases
assert {'type': 'added directory'} in get_changes('input/dir_added', joutput)
assert {'type': 'removed directory'} in get_changes('input/dir_removed', joutput)
if are_symlinks_supported():
# Basic symlink cases
assert {'type': 'changed link'} in get_changes('input/link_changed', joutput)
assert {'type': 'added link'} in get_changes('input/link_added', joutput)
assert {'type': 'removed link'} in get_changes('input/link_removed', joutput)
# Symlink replacing or being replaced
assert any(chg['type'] == 'mode' and chg['new_mode'].startswith('l') for chg in
get_changes('input/dir_replaced_with_link', joutput))
assert any(chg['type'] == 'mode' and chg['old_mode'].startswith('l') for chg in
get_changes('input/link_replaced_by_file', joutput))
# Symlink target removed. Should not affect the symlink at all.
assert not any(get_changes('input/link_target_removed', joutput))
# The inode has two links and the file contents changed. Borg
# should notice the changes in both links. However, the symlink
# pointing to the file is not changed.
expected = {'type': 'modified', 'added': 13, 'removed': 0} if can_compare_ids else {'type': 'modified'}
assert expected in get_changes('input/empty', joutput)
if are_hardlinks_supported():
assert expected in get_changes('input/hardlink_contents_changed', joutput)
if are_symlinks_supported():
assert not any(get_changes('input/link_target_contents_changed', joutput))
# Added a new file and a hard link to it. Both links to the same
# inode should appear as separate files.
assert {'type': 'added', 'size': 2048} in get_changes('input/file_added', joutput)
if are_hardlinks_supported():
assert {'type': 'added', 'size': 2048} in get_changes('input/hardlink_added', joutput)
# check if a diff between non-existent and empty new file is found
assert {'type': 'added', 'size': 0} in get_changes('input/file_empty_added', joutput)
# The inode has two links and both of them are deleted. They should
# appear as two deleted files.
assert {'type': 'removed', 'size': 256} in get_changes('input/file_removed', joutput)
if are_hardlinks_supported():
assert {'type': 'removed', 'size': 256} in get_changes('input/hardlink_removed', joutput)
# Another link (marked previously as the source in borg) to the
# same inode was removed. This should not change this link at all.
if are_hardlinks_supported():
assert not any(get_changes('input/hardlink_target_removed', joutput))
# Another link (marked previously as the source in borg) to the
# same inode was replaced with a new regular file. This should not
# change this link at all.
if are_hardlinks_supported():
assert not any(get_changes('input/hardlink_target_replaced', joutput))
do_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1a'), True)
# We expect exit_code=1 due to the chunker params warning
do_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1b', exit_code=1), False)
do_json_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1a', '--json-lines'), True)
def test_sort_option(self):
self.cmd('init', '--encryption=repokey', self.repository_location)