Re-synchronize chunks cache with repository.
If present, uses a compressed tar archive of known backup archive
indices, so it only needs to fetch infos from repo and build a chunk
index once per backup archive.
If out of sync, the tar gets rebuilt from known + fetched chunk infos,
so it has complete and current information about all backup archives.
Finally, it builds the master chunks index by merging all indices from
the tar.
Note: compression (esp. xz) is very effective in keeping the tar
relatively small compared to the files it contains.
Use python >= 3.3 to get better compression with xz,
there's a fallback to bz2 or gz when xz is not supported.
Making much better use of the CPU by dispatching all CPU intensive stuff
(hashing, crypto, compression) to N crypter threads (N == logical cpu count ==
4 for a dual-core CPU with hyperthreading).
I/O intensive stuff also runs in separate threads: the MainThread does the
filesystem traversal, the reader thread reads and chunks the files, the writer
thread writes to the repo. This way, we don't need to sit idle waiting for I/O,
but the I/O thread will block and another thread will get dispatched and use
the time. This applies for read as well as for write/fsync I/O wait time
(access time + data transfer).
There's one more thread, the "delayer". We need it to handle a race condition
related to the computation of the compressed size (which is only possible after
hashing/compression/encryption has finished). This "csize" makes all this code
quite more complicated than if we would not need it.
Although there is the GIL issue for Python code, we can still make good use of
multithreading as I/O operations and C code (that releases the GIL) can run in
parallel.
All threads are connected via Python Queues (which are intended for this and
thread safe). The Cache.chunks datastructure is also updated by threadsafe
code.
A little benchmark
------------------
Both is with compression (zlib level 6) and encryption on a haswell/ssd laptop:
Without multithreading code:
Command being timed: "borg create /extra/attic/borg::1 /home/tw/Desktop/"
User time (seconds): 13.78
System time (seconds): 0.40
Percent of CPU this job got: 83%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.98
With multithreading code:
Command being timed: "borg create /extra/attic/borg::1 /home/tw/Desktop/"
User time (seconds): 24.08
System time (seconds): 1.16
Percent of CPU this job got: 249%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.11
It's unclear to me why it uses much more "User time" (I'm not even sure that
measurement is correct). But the overall runtime "Elapsed" significantly
dropped and it makes better use of all cpu cores (not just 83% of one).