one of the biggest issues with borg < 1.0 was that it had a default target chunk
size of 64kiB, thus it created a lot of chunks, a huge chunk management overhead
(high RAM and disk usage).
the items metadata stream is usually not that big (compared to the file content data) -
it is just file and dir names and other metadata.
if we use too rough granularity there (and big minimum chunk size), we usually will get no deduplication.
as soon as one target segment is full, it is a good time to commit it and remove the source segments
that are already completely unused (because they were transferred int the target segment).
so, for compact_segments(save_space=True), the additional space needed should be about 1 segment size.
note: we can't just do that at the end of one source segment as this might create very small
target segments, which is not wanted.
this was making us require mock, which is really a test component and
shouldn't be part of the runtime dependencies. furthermore, it was
making the imports and the code more brittle: it may have been
possible that, through an environment variable, backups could be
corrupted because mock libraries would be configured instead of real
once, which is a risk we shouldn't be taking.
finally, this was used only to build docs, which we will build and
commit to git by hand with a fully working borg when relevant.
see #384.
without this, there would be a solid 20 seconds here without any sort
of output on the console, regardless of the verbosity level. this
makes nice incremental messages telling the user that borg is not
stalled (or waiting for a lock, for that matter)
the "processing files" message is a little clunky, as we somewhat
abuse the cache to figure out if we are just starting... but it helps
if there are problems reading the actual files: it tells us the
initialization is basically complete and we're going ahead with the
reading of all the files.
instead of applying this only to usage generation, use it as a generic
mechanism to disable loading of Cython code.
it may be incomplete: there may be other places where Cython code is
loaded that is not checked, but that is sufficient to build the usage
docs. the environment variable used is documented as such in the
docs/usage.rst.
we also move the check to a helper function and document it
better. this has the unfortunate side effect of moving includes
around, but I can't think of a better way.
this is such a crude hack it is totally embarrassing....
the proper solution would probably be to move the `build_parser()`
function out of `Archiver` completely, but this is such an undertaking
that i doubt it is worth doing since we're looking at switching to
click anyways.
the main problem in moving build_parser() out is that it references
`self` all the time, so it *needs* an archiver context that it can
reuse. we could make the function static and pass self in there by
hand, but it seems like almost a worse hack... and besides, we would
need to load the archiver in order to do that, which would break usage
all over again...
the logging level varies: most is logging.info(), in some place
logging.warning() or logging.error() are used when the condition is
clearly an error or warning. in other cases, we keep using print, but
force writing to sys.stderr, unless we interact with the user.
there were 77 calls to print before this commit, now there are 7, most
of which in the archiver module, which interacts directly with the
user. in one case there, we still use print() only because logging is
not setup properly yet during argument parsing.
it could be argued that commands like info or list should use print
directly, but we have converted them anyways, without ill effects on
the unit tests
unit tests still use print() in some places
this switches all informational output to stderr, which should help
with, if not fixjborg/attic#312 directly
e.g.:
- setting any security.* key is expected to fail with EACCES if one is not root.
- issue #162 on our issue tracker: user was root, but due to some specific scenario
involving docker and selinux, setting security.selinux key fails even when running as root
not sure if it is the best solution to silently ignore this, but some lines below this change
failure to do a chown is also silently ignored (happens e.g. when restoring a file not owned
by the current user as a non-root user).
if we use {} as default for item.get(), we do not need the "if" as iteration over an empty dict won't do anything.
also fixes too deep indentation the original code had.
if one used --last (or since shortly: gave an archive name), verify_chunks (old method name) was
not called because it requires all archives having been checked.
the problem was that also the final manifest.write() and repository.commit() was done in that method,
so all other repair work did not get committed in that case.
I moved these calls that to a separate finish() method.
This fix is maybe not perfect yet, but maybe better than nothing.
A comment by Ernest0x (see https://github.com/jborg/attic/issues/232 ):
@ThomasWaldmann your patch did the job.
attic check --repair did the repairing and attic delete deleted the archive.
Thanks.
That said, however, I am not sure if the best place to put the check is where
you put it in the patch. For example, the check operation uses a custom msgpack
unpacker class named "RobustUnpacker", which it does try to check for correct
format (see the comment: "Abort early if the data does not look like a
serialized dict"), but it seems it does not catch my case. The relevant code
in 'cache.py', on the other hand, uses msgpack's Unpacker class.