Commit graph

17 commits

Author SHA1 Message Date
Thomas Waldmann
91d2cfa671 Merge branch 'master' into multithreading 2015-08-15 21:45:52 +02:00
Thomas Waldmann
69456e07c4 cache sync: change progress output to separate lines
printing without \n plus sys.stdout.flush() didn't work as expected.
2015-08-09 19:02:35 +02:00
Thomas Waldmann
35b0f38f5c cache sync: show progress indication
sync can take quite long, so show what we are doing.
2015-08-09 01:14:37 +02:00
Thomas Waldmann
a1e039ba21 reimplement the chunk index merging in C
the python code could take a rather long time and likely most of it was converting stuff from python to C and back.
2015-08-06 23:32:53 +02:00
Thomas Waldmann
e4a41c8981 fix Traceback when running check --repair, attic issue #232
This fix is maybe not perfect yet, but maybe better than nothing.

A comment by Ernest0x (see https://github.com/jborg/attic/issues/232 ):

@ThomasWaldmann your patch did the job.
attic check --repair did the repairing and attic delete deleted the archive.
Thanks.

That said, however, I am not sure if the best place to put the check is where
you put it in the patch. For example, the check operation uses a custom msgpack
unpacker class named "RobustUnpacker", which it does try to check for correct
format (see the comment: "Abort early if the data does not look like a
serialized dict"), but it seems it does not catch my case. The relevant code
in 'cache.py', on the other hand, uses msgpack's Unpacker class.
2015-07-15 13:32:05 +02:00
Thomas Waldmann
b2f460d591 fix filenames used for locking, update docs about locking 2015-07-13 23:20:46 +02:00
Thomas Waldmann
e4c519b1e9 new locking code
exclusive locking by atomic mkdir fs operation
on top of that, shared (read) locks and exclusive (write) locks using a json roster.
2015-07-13 13:55:28 +02:00
Thomas Waldmann
434dac0e48 move locking code to own module, same for locking tests
fix imports, no other changes.
2015-07-12 23:41:52 +02:00
TW
4b81f380f8 Merge pull request #88 from ThomasWaldmann/py3style
style and cosmetic fixes, no semantic changes
2015-07-11 18:39:42 +02:00
Thomas Waldmann
0580f2b4eb style and cosmetic fixes, no semantic changes
use simpler super() syntax of python 3.x

remove fixed errors/warnings' codes from setup.cfg flake8 configuration

fix file exclusion list for flake8
2015-07-11 18:31:49 +02:00
Thomas Waldmann
a59211f295 use borg-tmp as prefix for temporary files / directories
also: remove some unused temp dir. code
2015-07-11 17:22:12 +02:00
Thomas Waldmann
4736f5b9d0 Merge branch 'master' into multithreading 2015-06-27 22:24:51 +02:00
Thomas Waldmann
a3f4d19515 speed up chunks cache sync, fixes #18
Re-synchronize chunks cache with repository.

If present, uses a compressed tar archive of known backup archive
indices, so it only needs to fetch infos from repo and build a chunk
index once per backup archive.

If out of sync, the tar gets rebuilt from known + fetched chunk infos,
so it has complete and current information about all backup archives.

Finally, it builds the master chunks index by merging all indices from
the tar.

Note: compression (esp. xz) is very effective in keeping the tar
            relatively small compared to the files it contains.

Use python >= 3.3 to get better compression with xz,
there's a fallback to bz2 or gz when xz is not supported.
2015-05-31 19:17:01 +02:00
Thomas Waldmann
926454c0d8 explicitely specify binary mode to open binary files
on POSIX OSes, it doesn't make a difference, but it is cleaner and also good for portability.
2015-05-31 17:57:45 +02:00
Thomas Waldmann
240a27a227 multithreaded "create" operation
Making much better use of the CPU by dispatching all CPU intensive stuff
(hashing, crypto, compression) to N crypter threads (N == logical cpu count ==
4 for a dual-core CPU with hyperthreading).

I/O intensive stuff also runs in separate threads: the MainThread does the
filesystem traversal, the reader thread reads and chunks the files, the writer
thread writes to the repo. This way, we don't need to sit idle waiting for I/O,
but the I/O thread will block and another thread will get dispatched and use
the time. This applies for read as well as for write/fsync I/O wait time
(access time + data transfer).

There's one more thread, the "delayer". We need it to handle a race condition
related to the computation of the compressed size (which is only possible after
hashing/compression/encryption has finished). This "csize" makes all this code
quite more complicated than if we would not need it.

Although there is the GIL issue for Python code, we can still make good use of
multithreading as I/O operations and C code (that releases the GIL) can run in
parallel.

All threads are connected via Python Queues (which are intended for this and
thread safe). The Cache.chunks datastructure is also updated by threadsafe
code.

A little benchmark
------------------

Both is with compression (zlib level 6) and encryption on a haswell/ssd laptop:

Without multithreading code:

    Command being timed: "borg create /extra/attic/borg::1 /home/tw/Desktop/"
    User time (seconds): 13.78
    System time (seconds): 0.40
    Percent of CPU this job got: 83%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.98

With multithreading code:

    Command being timed: "borg create /extra/attic/borg::1 /home/tw/Desktop/"
    User time (seconds): 24.08
    System time (seconds): 1.16
    Percent of CPU this job got: 249%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.11

It's unclear to me why it uses much more "User time" (I'm not even sure that
measurement is correct). But the overall runtime "Elapsed" significantly
dropped and it makes better use of all cpu cores (not just 83% of one).
2015-05-25 22:37:15 +02:00
Thomas Waldmann
5e98400a5a fix all references to package name
use relative imports if possible
reorder imports (1. stdlib 2. dependencies 3. borg 4. borg.testsuite)
2015-05-22 19:21:41 +02:00
Thomas Waldmann
78bfc58b47 rename package directory to borg 2015-05-22 17:48:54 +02:00
Renamed from attic/cache.py (Browse further)