this was left over from times when we either used mock from stdlib
or pypi mock. but as we only use pypi mock now, the indirection is
not needed any more.
if one used --last (or since shortly: gave an archive name), verify_chunks (old method name) was
not called because it requires all archives having been checked.
the problem was that also the final manifest.write() and repository.commit() was done in that method,
so all other repair work did not get committed in that case.
I moved these calls that to a separate finish() method.
New null and lz4 compression.
Giving -C 0 now uses null compression, not zlib level 0 any more
(null has almost zero overhead while zlib-level0 still had to package everything into zlib frames).
Giving -C 10 uses new lz4 compression, super fast compression and even faster decompression.
See borg create --help (and --compression argument).
fix some issues, clean up, optimize:
CNULL: always return bytes
LZ4: deal with getting memoryviews
Compressor: give bytes to detect(), avoid memoryviews
for lz4, always use same COMPR_BUFFER, avoid memory management costs.
check --chunker-params CHUNK_MAX_EXP upper limit
This fix is maybe not perfect yet, but maybe better than nothing.
A comment by Ernest0x (see https://github.com/jborg/attic/issues/232 ):
@ThomasWaldmann your patch did the job.
attic check --repair did the repairing and attic delete deleted the archive.
Thanks.
That said, however, I am not sure if the best place to put the check is where
you put it in the patch. For example, the check operation uses a custom msgpack
unpacker class named "RobustUnpacker", which it does try to check for correct
format (see the comment: "Abort early if the data does not look like a
serialized dict"), but it seems it does not catch my case. The relevant code
in 'cache.py', on the other hand, uses msgpack's Unpacker class.
it was silently failing until recently. and it can't work the way it is on RemoteRepository.
it's still active (and now even really working) for the (local) Repository tests.
the old code blows up with an integer OverflowError when the cache file goes beyond 2GiB size.
the new code just reuses the Repository implementation as a local temporary key/value store.
still an issue: if the place where the temporary RepositoryCache is stored (usually /tmp) can't
cope with the cache size and runs full.
if you copy data from a fuse mount, the cache size is the copied deduplicated data size.
so, if you have lots of data to extract (more than your /tmp can hold), rather do not use fuse!
besides fuse mounts, this also affects attic check and cache sync (in these cases, only the
metadata size counts, but even that can go beyond 2GiB for some people).
also: add some benchmarking output showing singlethread, multithread and
multithread-with-gil-releasing-chunker performance.
this changeset maybe improves multithreading performance a little, about 3%
(but that might be close to the measurement accuracy).
always use archiver.print_error, so it goes to sys.stderr
always say "Error: ..." for errors
for rc != 0 always say "Exiting with failure status ..."
catch all exceptions subclassing Exception, so we can log them in same way and set exit_code=1
- use power-of-2 sizes / n bit hash mask so one can give them more easily
- chunker api: give seed first, so we can give *chunker_params after it
- fix some tests that aren't possible with 2^N
- make sparse file extraction zero detection flexible for variable chunk max size
regular files are most common, more than directories. fifos are rare.
was no big issue, the calls are cheap, but also no big issue to just fix the order.
they are rare, so it's pointless to check for them first.
seen the stat..S_ISSOCK in profiling results with high call count.
was no big issue, that call is cheap, but also no big issue to just fix the order.
Re-synchronize chunks cache with repository.
If present, uses a compressed tar archive of known backup archive
indices, so it only needs to fetch infos from repo and build a chunk
index once per backup archive.
If out of sync, the tar gets rebuilt from known + fetched chunk infos,
so it has complete and current information about all backup archives.
Finally, it builds the master chunks index by merging all indices from
the tar.
Note: compression (esp. xz) is very effective in keeping the tar
relatively small compared to the files it contains.
Use python >= 3.3 to get better compression with xz,
there's a fallback to bz2 or gz when xz is not supported.
if we have a OS file handle, we can directly read to the final destination - one memcpy less.
if we have a Python file object, we get a Python bytes object as read result (can't save the memcpy here).
a lot of speedup for:
"list <repo>", "delete <repo>" list, "prune" - esp. for slow connections to remote repositories.
the previous method used metadata from the archive itself, which is (in total) rather large.
so if you had many archives and a slow (remote) connection, it was very slow.
but there is a lot easier way: just use the archives list from the repository manifest - we already
have it anyway and it also has name, id and timestamp for all archives - and that's all we need.
I defined a ArchiveInfo namedtuple that has same element names as seen as attribute names
of the Archive object, so as long as name, id, ts is enough, it can be used in its place.
Making much better use of the CPU by dispatching all CPU intensive stuff
(hashing, crypto, compression) to N crypter threads (N == logical cpu count ==
4 for a dual-core CPU with hyperthreading).
I/O intensive stuff also runs in separate threads: the MainThread does the
filesystem traversal, the reader thread reads and chunks the files, the writer
thread writes to the repo. This way, we don't need to sit idle waiting for I/O,
but the I/O thread will block and another thread will get dispatched and use
the time. This applies for read as well as for write/fsync I/O wait time
(access time + data transfer).
There's one more thread, the "delayer". We need it to handle a race condition
related to the computation of the compressed size (which is only possible after
hashing/compression/encryption has finished). This "csize" makes all this code
quite more complicated than if we would not need it.
Although there is the GIL issue for Python code, we can still make good use of
multithreading as I/O operations and C code (that releases the GIL) can run in
parallel.
All threads are connected via Python Queues (which are intended for this and
thread safe). The Cache.chunks datastructure is also updated by threadsafe
code.
A little benchmark
------------------
Both is with compression (zlib level 6) and encryption on a haswell/ssd laptop:
Without multithreading code:
Command being timed: "borg create /extra/attic/borg::1 /home/tw/Desktop/"
User time (seconds): 13.78
System time (seconds): 0.40
Percent of CPU this job got: 83%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.98
With multithreading code:
Command being timed: "borg create /extra/attic/borg::1 /home/tw/Desktop/"
User time (seconds): 24.08
System time (seconds): 1.16
Percent of CPU this job got: 249%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.11
It's unclear to me why it uses much more "User time" (I'm not even sure that
measurement is correct). But the overall runtime "Elapsed" significantly
dropped and it makes better use of all cpu cores (not just 83% of one).