added try/finally (the code in between was just indented, no
other code changes) to make sure it sets self.index back to None,
even if the code crashes e.g. due to an IntegrityError caused
by an incomplete segment caused by a disk full condition.
also, in prepare_txn, create an empty in-memory index if transaction_id
is None, which is required by the Repository.check code to work correctly.
If the index is not empty there, it will miscalculate segment usage
(self.segments).
this is so that e.g. cron jobs do not hang indefinitely if yes() is called,
but it will just default to "no" if not tty is connected.
if you need to enforce a "yes" answer (which is not recommended for
the security critical questions), you can use the environment:
BORG_CHECK_I_KNOW_WHAT_I_AM_DOING=Y
this is needed for tools like borgweb (or in general: when the rc value / exit status should
be logged for later review or directly seen on screen).
this is off by default, so the output is less verbose (and also does not fail tests which
counts lines).
parse_args concentrates on only processing arguments, including pre and post processing.
this needs to be called before run(), which is now receiving the return value of parse_args.
this was done so we can have the parsed args outside of the run function, e.g. in main().
subclasses of "Error": do not show traceback
(this is used when a failure is expected and has rather trivial reasons and usually
does not need debugging)
subclasses of "ErrorWithTraceback": show a traceback
(this is for severe and rather unexpected stuff, like consistency / corruption issues
or stuff that might need debugging)
I reviewed all the Error subclasses whether they fit into the one or other class.
Also: fixed docstring typo, docstring formatting
the reason for a slow msgpack can be:
- pip install: missing compiler (gcc)
- pip install: missing compiler parts (e.g. gcc-c++)
- pip install: cached wheel package that was built while the compiler wasn't present
- distribution package: badly built msgpack package
there were 2 issues:
lock used another pid and tid than release because daemonize() made it different processes/threads.
solved by just determining PID only once and not using TID any more.
the other issue was that the repo needed and explicit closing.
just to avoid rounding / precision issues with floating point computations on py < 3.3
I used 2 hardcoded "full second" values on the input file and check if they get restored
correctly.
All normal informational output is now logged at INFO level.
To actually see normal output, the logger is configured to level INFO also.
Log levels:
WARNING is for warnings, ERROR is for (fatal) errors, DEBUG is for debugging.
logging levels must only be used according to this semantics and must not be
(ab)used to let something show up (although the logger would usually hide it)
or hide something (although the logger would usually show it).
Controlling the amount of output shown on INFO level:
--verbose, --progress, --stats are currently used for this.
more such flags might be added later as needed.
if they are set, more output is logged (at INFO level).
also: change strange setup_logging return value
this way the version can be discovered by scripts without having to
part the output of 'help'.
it is removed from the 'help' output itself because it is prettier
without the complete version number, and then the description can be
reused elsewhere as well without needing the version number
0 = success (logged as INFO)
1 = warning (logged as WARNING)
2 = (fatal and abrupt) error (logged as ERROR)
please use the EXIT_(SUCCESS,WARNING,ERROR) constants from helpers module.
those can now support both file sizes (in SI/decimal format, powers of 10) and memory sizes (in binary format, powers of 2)
tests still fail because the result is always displayed as floats
this has never worked as intended as the function was not using the computed "fields[1]" value at all.
plus there were type issues after that was fixed.
if the borg.exe binary is not available in PATH, binary tests are skipped.
source tests are run without forking (for better speed, esp. on travis).
binary tests need forking the binary, of course.
for source tests, some tests check for an exception to happen.
for a forked binary, we of course can only check the exit code, which is non-zero in that case.
it seems it is possible that the chunks files are copied but *not*
converted. this may have happened here because the conversion was
interrupted, although the specific scenario is still unclear (but it
did happen during manual tests here). therefore reproducing this
problem seems to be difficult, hence the lack of tests for this
specific issue.
since we consider the header replacement code to be safe, that we
always convert shouldn't pose any additional threat to the existing
borg chunk cache.
this resolves bug #something where the index file could not be
converted, completely breaking conversion.
it seems that, during some refactoring, the index conversion code was
completely dropped. this was missed by the unit tests because the repo
can still be opened by the constructor even though the index is
invalid, so tests need improvements there.
this function was over-coupling the cache system and the statistics
module. they are now almost decoupled insofar as the cache system has
its own rendering system now that is called separately.
furthermore, we have a much more flexible formatting system that is
used coherently between --progress and --stats
the degenerate case here is if we want to change the label in the
statistics summary: in this case we need to override the default
__str__() representation to insert our own label.
i saw the errors in my ways: __format__ is only to customize the
"format mini-language", what comes after ":" in a new string
format. unfortunately, we cannot easily refer to individual fields in
there, short of re-implementing a new formatting language, which seems
silly.
instead, we use properties to extract human-readable versions of the
statistics. more intuitive and certainly a more common pattern than
the exotic __format__().
also add unit tests to prove this all works
we stop enforcing a minimum width for fields, it changes only on
logarithmic boundaries, so not a big problem. string conversion is
implicit
this gives us a little more width for the path
we use the new get_terminal_size() function, with a fallback for
Python 3.2. we default to 80 columns.
then we generate the stats bit and fill the rest with the path, as
previously, but with a possibly larger field.
note that this works with resizes in my test (uxterm)
the --stats output would be slightly garbled by --progress, because of
the \r that is output at the last line...
example:
initializing cache
reading files cache
processing files
------------------------------------------------------------------------------ s/twotone
Archive name: 2015-10-15-test
a single -v flag shouldn't flood the console with all the files in the
path specified, it makes -v basically useless
this way, -v can also be used with --progress to have nicer output:
initializing cache
reading files cache
processing files
5.20 GB O 2.66 GB C 25.13 MB D 27576 N baz/...
as it was, surrogates were not always removed, for example
we may also want to output at different levels or control if we want
to print unchanged files and so on
without this, there would be a solid 20 seconds here without any sort
of output on the console, regardless of the verbosity level. this
makes nice incremental messages telling the user that borg is not
stalled (or waiting for a lock, for that matter)
the "processing files" message is a little clunky, as we somewhat
abuse the cache to figure out if we are just starting... but it helps
if there are problems reading the actual files: it tells us the
initialization is basically complete and we're going ahead with the
reading of all the files.
the old name still works, but emits a deprecation warning suggesting the new name.
this is a followup to 4fd06e2634, which added "-x" (as seen in "du").
instead, we perform the equivalent of `cp -al` on the repository to
keep a backup, and then rewrite the files, breaking the hardlinks as
necessary.
it has to be confirmed that the rest of Borg will also break hardlinks
when operating on files in the repository. if Borg operates in place
on any files of the repository, it could jeoperdize the backup, so
this needs to be verified. I believe that most files are written to a
temporary file and moved into place, however, so the backup should be
safe.
the rationale behind the backup copy is that we want to be extra
careful with user's data by default. the old behavior is retained
through the `--inplace`/`-i` commandline flag. plus, this way we don't
need to tell users to go through extra steps (`cp -a`, in particular)
before running the command.
also, it can take a long time to do the copy of the attic repository
we wish to work on. since `cp -a` doesn't provide progress
information, the new default behavior provides a nicer user experience
of giving an overall impression of the upgrade progress, while
retaining compatibility with Attic by default (in a separate
repository, of course).
this makes the upgrade command much less scary to use and hopefully
will convert drones to the borg collective.
the only place where the default inplace behavior is retained is in
the header_replace() function, to avoid breaking the cache conversion
code and to keep API stability and semantic coherence ("replace" by
defaults means in place).
we use forking mode always and either execute python with the archiver module or the "borg.exe" binary.
the cmd fixture alternates between 'python' and 'binary' mode and calls exec_cmd accordingly.
Instead of "realistic data", I chose the test data to be either all-zero (all-ascii-zero to be precise)
or all-random and benchmark them separately.
So we can better determine the cause (deduplication or storage) in case we see some performance regression.
"help" is benchmarked to see the minimum runtime when it basically does nothing.
also:
- refactor archiver execution core functionality into exec_cmd() so it can be used more flexibly
- tox: usually we want to skip benchmarks, only run them if requested manually
- install pytest-benchmark - run tox with "-r" to have it installed into your .tox envs
this was a remnant of when i was writing the converter/upgrader code,
and was destined to be a general progress message in the migration
process. i removed a more technical, internal debugging message in
exchange
instead of applying this only to usage generation, use it as a generic
mechanism to disable loading of Cython code.
it may be incomplete: there may be other places where Cython code is
loaded that is not checked, but that is sufficient to build the usage
docs. the environment variable used is documented as such in the
docs/usage.rst.
we also move the check to a helper function and document it
better. this has the unfortunate side effect of moving includes
around, but I can't think of a better way.
this is such a crude hack it is totally embarrassing....
the proper solution would probably be to move the `build_parser()`
function out of `Archiver` completely, but this is such an undertaking
that i doubt it is worth doing since we're looking at switching to
click anyways.
the main problem in moving build_parser() out is that it references
`self` all the time, so it *needs* an archiver context that it can
reuse. we could make the function static and pass self in there by
hand, but it seems like almost a worse hack... and besides, we would
need to load the archiver in order to do that, which would break usage
all over again...
this is an unfortunate rewrite of the manpage creation code mentionned
in #208. ideally, this would be rewritten into a class that can
generate both man pages and .rst files.
while SSH options can be specified through `~/.ssh/config`, some users
may want to use a completely different SSH command for their backups,
without overriding their $PATH variable. it may also be easier to do
ad-hoc configuration and tests that way.
plus, the POLA tells us that users expects something like this to be
supported by commands that talk to ssh. it is supported by rsync, git
and so on.
the reasoning behind this is that we may need to test a
RemoteRepository setup outside of the main archiver routines, which
the current default location makes impossible
by moving the umask and remote_path remotes into the RemoteRepository
the (reasonable) defaults are available regardless of the (currently
obscure) initialisation routine, and make unit tests easier to develop
and support
Code shared by read() and iter_objects() was moved into _read().
Compared to read()'s previous state, this improved:
- fixed size check to avoid read with negative size
- exception handler for struct unpack
- checking for short read
- more precise exception messages
it seems the file cache does *not* have the ATTIC magic header (nor
does it have one in borg), so we don't need to edit the file - we just
copy it like a regular file.
while i'm here, simplify the cache conversion loop: it's no use
splitting the copy and the edition since the latter is so fast, just
do everything in one loop, which makes it much easier to read.
convert is too generic for the Attic conversion: we may have other
converters, from other, more foreign systems that will require
different options and different upgrade mechanisms that convert could
never cover appropriately. we are more likely to use an approach
similar to "git fast-import" instead here, and have the conversion
tools be external tool that feed standard data into borg during
conversion.
upgrade seems like a more natural fit: Attic could be considered like
a pre-historic version of Borg that requires invasive changes for borg
to be able to use the repository. we may require such changes in the
future of borg as well: if we make backwards-incompatible changes to
the repository layout or data format, it is possible that we require
such changes to be performed on the repository before it is usable
again. instead of scattering those conversions all over the code, we
should simply have assertions that check the layout is correct and
point the user to upgrade if it is not.
upgrade should eventually automatically detect the repository format
or version and perform appropriate conversions. Attic is only the
first one. we still need to implement an adequate API for
auto-detection and upgrade, only the seeds of that are present for now.
of course, changes to the upgrade command should be thoroughly
documented in the release notes and an eventual upgrade manual.
we separate the conversion and the copy in order to be able to copy
arbitrary files from attic without converting them. this allows us to
copy the config file cleanly without attempting to rewrite its magic
number
this greatly simplifies the display of those objects, as the
__format__() parameter allows for arbitrary display of the internal
fields of both objects
this will allow us to display those summaries without having to pass a
label to the string representation. we can also print the objects
directly without formatting at all.
- issue #234: handle exception when config file is empty is really not a borg cache config
- there was a unused %s in the Exception string
- error msg was wrong when version check failed - this IS a borg cache, but not of expected version
the heuristics i used are the following:
1. if we are prompting the use, use print on stderr (input() may
produce some stuff on stdout, but it's outside the scope of this
patch). we do not want those prompts to end up on the standard
output in case we are piping stuff around
2. if the command is primarily producing output for the user on the
console (`list`, `info`, `help`), we simply print on the default
file descriptor.
3. everywhere else, we use the logging module with varying levels of
verbosity, as appropriate.
the logging level varies: most is logging.info(), in some place
logging.warning() or logging.error() are used when the condition is
clearly an error or warning. in other cases, we keep using print, but
force writing to sys.stderr, unless we interact with the user.
there were 77 calls to print before this commit, now there are 7, most
of which in the archiver module, which interacts directly with the
user. in one case there, we still use print() only because logging is
not setup properly yet during argument parsing.
it could be argued that commands like info or list should use print
directly, but we have converted them anyways, without ill effects on
the unit tests
unit tests still use print() in some places
this switches all informational output to stderr, which should help
with, if not fixjborg/attic#312 directly
it only checked for too big sizes, but not for too small ones.
that made it die with a ValueError and not raise the appropriate IntegrityError
that gets handled in check() and triggers the repair attempt for the segment.
not sure where the problem is:
it seems to announce it supports st_mtime_ns, but if one uses it and
has a file with ...123ns, i t gets restored as ...000ns.
Then I tried setting st_mtime_ns_round to -3, but it still failed with +1000ns difference.
Maybe rounding is incorrect and it should be truncating?
Issue with granularity could be in python, in netbsd (netbsd platform code), in ffs filesystem, ...
for vagrant testing on misc. platforms, we can't know the group /
we can't have the same group everywhere.
but the OS won't let us set setgid bit if the file does not have our group.
on netbsd, the created file somehow happens to have group "wheel",
but vagrant is not in group wheel.