the statically allocated COMPR_BUFFER was right size for chunks,
but not for the archive item which could get larger if you have
many millions of files/dirs.
NOATIME support needed checking and the flagfile was UF_NODUMP and thus not there in the backup archive.
Note: i have just duplicated the has_noatime function instead of refactoring it to be global,
to avoid merge conflicts in case we cherry-pick the test improvements from master.
removed the pointless platform check.
just first test the input file with the same checks we expect succeeding
on the extracted file. skip sparse archiving / extraction testing if the input
file checks fail - likely we have a problem with the OS or the FS then.
use the same condition for the input file as also later for the extracted file.
the test preparation sparseness assertion failed on cygwin / ntfs, because the
input file uses ~40MB in blocks vs. total_len ~80MB.
do not ignore bad placeholders and just return empty string,
this could have bad consequences, e.g. with --prefix '{invalidplaceholder}':
a typo in the placeholder name would cause the prefix to be the empty string.
in openssl 1.1, the cipher context is opaque, members can not
be accessed directly. we only used this for ctx.iv to determine
the current IV (counter value).
now, we just remember the original IV, count the AES blocks we
process and then compute iv = iv_orig + blocks.
that way, it works on OpenSSL 1.0.x and >= 1.1 in the same way.
found out that xfs is doing stuff behind the scenes: it is pre-allocating 16MB
to prevent fragmentation (in my case, value depends on misc factors).
fixed the test so it just checks that the extracted sparse file uses less (not
necessary much less) space than a non-sparse file would use.
another problem showed up when i tried to verify the holes in the sparse file
via SEEK_HOLE, SEEK_DATA:
after the few bytes of real data in the file, there was another 16MB
preallocated space.
So I ended up checking just the hole at the start of the file.
tested on: ext4, xfs, zfs, btrfs
we need a list of valid item metadata keys. using a list stored in the repo manifest
is more future-proof than the hardcoded ITEM_KEYS in the source code.
keys that are in union(item_keys_from_repo, item_keys_from_source) are considered valid.
when trying to resync and skip invalid data, borg tries to qualify a byte sequence as
valid-looking msgpacked item metadata dict (or not) before even invoking msgpack's unpack.
besides previously hard to understand code, there were 2 issues:
- a missing check for map16 - this type is what msgpack uses if the dict has more than
15 items (could happen in future, not for 1.0.x).
- missing checks for str8/16/32 - str16 is what msgpack uses if the bytestring has more than 31 bytes
(borg does not have that long key names, thus this wasn't causing any harm)
this misqualification (valid data considered invalid) could lead to a wrong resync, skipping valid items.
added more comments and tests.
They are extracted correctly, for a little while at least, since chown()
*resets* all capabilities on the chowned file. Which I find curious,
since chown() is a privileged syscall. Probably a safeguard for
sysadmins who are unaware of capabilities.
The solution is to set the xattrs last, after chown()ing files.
(Remote)Repository.close() is not a public API anymore, but a private
API. It shall not be used from within other classes than Repository
or it's tests. The proper way is to use a context manager now. However,
for RPC/Remote compatibility with Borg 1.0 it is kept and unchanged.
Repositories are not opened by __init__ now anymore, it is done
by binding it to a context manager. (This SHOULD be compatible both ways
with remote, since opening the repo is handled by a RepositoryServer method)
Decorators @with_repository() and @with_archive simplify
context manager handling and remove unnecessary indentation.
Backported to 1.0-maint
[1]
This worked incidentally because OSes tend to return at least one page
worth of data when EOF is not reached. Increasing WINDOW_SIZE beyond
the page size might have lead to data loss.
[2]
If read() of the passed Python object returned something not-bytes,
PyBytes_Size returns -1 (ssize_t) which becomes a very larger number for
memcpy()s size_t.