due to borg's architecture, breaking the repo lock needs first creating a repository object.
this would usually try to get a lock and then block if there already is one.
thus I added a flag to open without trying to create a lock.
--progress isn't a "toggle" anymore, in that it will never disable progress information: always enable it.
example:
$ borg create ~/test/borg2::test$(date +%s) . ; echo ^shows progress
reading files cache
processing files
^shows progress
$ borg create ~/test/borg2::test$(date +%s) . < /dev/null; echo ^no progress
reading files cache
processing files
^no progress
$ borg create --progress ~/test/borg2::test$(date +%s) . < /dev/null; echo ^progress forced
reading files cache
processing files
^progress forced
$ borg create --no-progress ~/test/borg2::test$(date +%s) . ; echo ^no progress
reading files cache
processing files
^no progress
we introduce a ToggleAction that can be used for other options, but
right now is just slapped in there near the code, which isn't that
elegant. inspired by:
http://stackoverflow.com/questions/11507756/python-argparse-toggle-flags
note that this is supported out of the box by click:
http://click.pocoo.org/5/options/#boolean-flagsfixes#398
this was making us require mock, which is really a test component and
shouldn't be part of the runtime dependencies. furthermore, it was
making the imports and the code more brittle: it may have been
possible that, through an environment variable, backups could be
corrupted because mock libraries would be configured instead of real
once, which is a risk we shouldn't be taking.
finally, this was used only to build docs, which we will build and
commit to git by hand with a fully working borg when relevant.
see #384.
We also add --keep-tag-files to keep in the archive the root directory and the
tag/exclusion file in the archive.
This is taken from a attic PR (and adapted for borg):
commit f61e22cacc90e76e6c8f4b23677eee62c09e97ac
Author: Yuri D'Elia <yuri.delia@eurac.edu>
Date: Mon Dec 15 12:27:43 2014 +0100
Add a new --exclude-if-present command-line flag to ``borg create``. If
specified, directories containing the specified tag file will be excluded from
the backup. The flag can be repeated to ignore more than a single tag file,
irregardless of the contents.
This is taken from a attic PR (and adapted for borg):
commit 3462a9ca90388dc5d8b4fa4218a32769676b3623
Author: Yuri D'Elia <yuri.delia@eurac.edu>
Date: Sun Dec 7 19:15:17 2014 +0100
- can create 0-byte files now
- frees space early (avoids running out of disk space at repo init time)
- creates multiple reserve files, so we do not only reserve some space,
but also some inodes
- only print output if there is an error RC
- if make_files makes us run out of space, that is not interesting, just start
a new iteration from scratch
added try/finally (the code in between was just indented, no
other code changes) to make sure it sets self.index back to None,
even if the code crashes e.g. due to an IntegrityError caused
by an incomplete segment caused by a disk full condition.
also, in prepare_txn, create an empty in-memory index if transaction_id
is None, which is required by the Repository.check code to work correctly.
If the index is not empty there, it will miscalculate segment usage
(self.segments).
this is so that e.g. cron jobs do not hang indefinitely if yes() is called,
but it will just default to "no" if not tty is connected.
if you need to enforce a "yes" answer (which is not recommended for
the security critical questions), you can use the environment:
BORG_CHECK_I_KNOW_WHAT_I_AM_DOING=Y
this is needed for tools like borgweb (or in general: when the rc value / exit status should
be logged for later review or directly seen on screen).
this is off by default, so the output is less verbose (and also does not fail tests which
counts lines).
parse_args concentrates on only processing arguments, including pre and post processing.
this needs to be called before run(), which is now receiving the return value of parse_args.
this was done so we can have the parsed args outside of the run function, e.g. in main().
subclasses of "Error": do not show traceback
(this is used when a failure is expected and has rather trivial reasons and usually
does not need debugging)
subclasses of "ErrorWithTraceback": show a traceback
(this is for severe and rather unexpected stuff, like consistency / corruption issues
or stuff that might need debugging)
I reviewed all the Error subclasses whether they fit into the one or other class.
Also: fixed docstring typo, docstring formatting
the reason for a slow msgpack can be:
- pip install: missing compiler (gcc)
- pip install: missing compiler parts (e.g. gcc-c++)
- pip install: cached wheel package that was built while the compiler wasn't present
- distribution package: badly built msgpack package
there were 2 issues:
lock used another pid and tid than release because daemonize() made it different processes/threads.
solved by just determining PID only once and not using TID any more.
the other issue was that the repo needed and explicit closing.
just to avoid rounding / precision issues with floating point computations on py < 3.3
I used 2 hardcoded "full second" values on the input file and check if they get restored
correctly.
All normal informational output is now logged at INFO level.
To actually see normal output, the logger is configured to level INFO also.
Log levels:
WARNING is for warnings, ERROR is for (fatal) errors, DEBUG is for debugging.
logging levels must only be used according to this semantics and must not be
(ab)used to let something show up (although the logger would usually hide it)
or hide something (although the logger would usually show it).
Controlling the amount of output shown on INFO level:
--verbose, --progress, --stats are currently used for this.
more such flags might be added later as needed.
if they are set, more output is logged (at INFO level).
also: change strange setup_logging return value
this way the version can be discovered by scripts without having to
part the output of 'help'.
it is removed from the 'help' output itself because it is prettier
without the complete version number, and then the description can be
reused elsewhere as well without needing the version number
0 = success (logged as INFO)
1 = warning (logged as WARNING)
2 = (fatal and abrupt) error (logged as ERROR)
please use the EXIT_(SUCCESS,WARNING,ERROR) constants from helpers module.
those can now support both file sizes (in SI/decimal format, powers of 10) and memory sizes (in binary format, powers of 2)
tests still fail because the result is always displayed as floats
this has never worked as intended as the function was not using the computed "fields[1]" value at all.
plus there were type issues after that was fixed.
if the borg.exe binary is not available in PATH, binary tests are skipped.
source tests are run without forking (for better speed, esp. on travis).
binary tests need forking the binary, of course.
for source tests, some tests check for an exception to happen.
for a forked binary, we of course can only check the exit code, which is non-zero in that case.
it seems it is possible that the chunks files are copied but *not*
converted. this may have happened here because the conversion was
interrupted, although the specific scenario is still unclear (but it
did happen during manual tests here). therefore reproducing this
problem seems to be difficult, hence the lack of tests for this
specific issue.
since we consider the header replacement code to be safe, that we
always convert shouldn't pose any additional threat to the existing
borg chunk cache.
this resolves bug #something where the index file could not be
converted, completely breaking conversion.
it seems that, during some refactoring, the index conversion code was
completely dropped. this was missed by the unit tests because the repo
can still be opened by the constructor even though the index is
invalid, so tests need improvements there.
this function was over-coupling the cache system and the statistics
module. they are now almost decoupled insofar as the cache system has
its own rendering system now that is called separately.
furthermore, we have a much more flexible formatting system that is
used coherently between --progress and --stats
the degenerate case here is if we want to change the label in the
statistics summary: in this case we need to override the default
__str__() representation to insert our own label.
i saw the errors in my ways: __format__ is only to customize the
"format mini-language", what comes after ":" in a new string
format. unfortunately, we cannot easily refer to individual fields in
there, short of re-implementing a new formatting language, which seems
silly.
instead, we use properties to extract human-readable versions of the
statistics. more intuitive and certainly a more common pattern than
the exotic __format__().
also add unit tests to prove this all works
we stop enforcing a minimum width for fields, it changes only on
logarithmic boundaries, so not a big problem. string conversion is
implicit
this gives us a little more width for the path
we use the new get_terminal_size() function, with a fallback for
Python 3.2. we default to 80 columns.
then we generate the stats bit and fill the rest with the path, as
previously, but with a possibly larger field.
note that this works with resizes in my test (uxterm)
the --stats output would be slightly garbled by --progress, because of
the \r that is output at the last line...
example:
initializing cache
reading files cache
processing files
------------------------------------------------------------------------------ s/twotone
Archive name: 2015-10-15-test
a single -v flag shouldn't flood the console with all the files in the
path specified, it makes -v basically useless
this way, -v can also be used with --progress to have nicer output:
initializing cache
reading files cache
processing files
5.20 GB O 2.66 GB C 25.13 MB D 27576 N baz/...
as it was, surrogates were not always removed, for example
we may also want to output at different levels or control if we want
to print unchanged files and so on
without this, there would be a solid 20 seconds here without any sort
of output on the console, regardless of the verbosity level. this
makes nice incremental messages telling the user that borg is not
stalled (or waiting for a lock, for that matter)
the "processing files" message is a little clunky, as we somewhat
abuse the cache to figure out if we are just starting... but it helps
if there are problems reading the actual files: it tells us the
initialization is basically complete and we're going ahead with the
reading of all the files.
the old name still works, but emits a deprecation warning suggesting the new name.
this is a followup to 4fd06e2634, which added "-x" (as seen in "du").
instead, we perform the equivalent of `cp -al` on the repository to
keep a backup, and then rewrite the files, breaking the hardlinks as
necessary.
it has to be confirmed that the rest of Borg will also break hardlinks
when operating on files in the repository. if Borg operates in place
on any files of the repository, it could jeoperdize the backup, so
this needs to be verified. I believe that most files are written to a
temporary file and moved into place, however, so the backup should be
safe.
the rationale behind the backup copy is that we want to be extra
careful with user's data by default. the old behavior is retained
through the `--inplace`/`-i` commandline flag. plus, this way we don't
need to tell users to go through extra steps (`cp -a`, in particular)
before running the command.
also, it can take a long time to do the copy of the attic repository
we wish to work on. since `cp -a` doesn't provide progress
information, the new default behavior provides a nicer user experience
of giving an overall impression of the upgrade progress, while
retaining compatibility with Attic by default (in a separate
repository, of course).
this makes the upgrade command much less scary to use and hopefully
will convert drones to the borg collective.
the only place where the default inplace behavior is retained is in
the header_replace() function, to avoid breaking the cache conversion
code and to keep API stability and semantic coherence ("replace" by
defaults means in place).
we use forking mode always and either execute python with the archiver module or the "borg.exe" binary.
the cmd fixture alternates between 'python' and 'binary' mode and calls exec_cmd accordingly.
Instead of "realistic data", I chose the test data to be either all-zero (all-ascii-zero to be precise)
or all-random and benchmark them separately.
So we can better determine the cause (deduplication or storage) in case we see some performance regression.
"help" is benchmarked to see the minimum runtime when it basically does nothing.
also:
- refactor archiver execution core functionality into exec_cmd() so it can be used more flexibly
- tox: usually we want to skip benchmarks, only run them if requested manually
- install pytest-benchmark - run tox with "-r" to have it installed into your .tox envs
this was a remnant of when i was writing the converter/upgrader code,
and was destined to be a general progress message in the migration
process. i removed a more technical, internal debugging message in
exchange
instead of applying this only to usage generation, use it as a generic
mechanism to disable loading of Cython code.
it may be incomplete: there may be other places where Cython code is
loaded that is not checked, but that is sufficient to build the usage
docs. the environment variable used is documented as such in the
docs/usage.rst.
we also move the check to a helper function and document it
better. this has the unfortunate side effect of moving includes
around, but I can't think of a better way.
this is such a crude hack it is totally embarrassing....
the proper solution would probably be to move the `build_parser()`
function out of `Archiver` completely, but this is such an undertaking
that i doubt it is worth doing since we're looking at switching to
click anyways.
the main problem in moving build_parser() out is that it references
`self` all the time, so it *needs* an archiver context that it can
reuse. we could make the function static and pass self in there by
hand, but it seems like almost a worse hack... and besides, we would
need to load the archiver in order to do that, which would break usage
all over again...
this is an unfortunate rewrite of the manpage creation code mentionned
in #208. ideally, this would be rewritten into a class that can
generate both man pages and .rst files.
while SSH options can be specified through `~/.ssh/config`, some users
may want to use a completely different SSH command for their backups,
without overriding their $PATH variable. it may also be easier to do
ad-hoc configuration and tests that way.
plus, the POLA tells us that users expects something like this to be
supported by commands that talk to ssh. it is supported by rsync, git
and so on.
the reasoning behind this is that we may need to test a
RemoteRepository setup outside of the main archiver routines, which
the current default location makes impossible
by moving the umask and remote_path remotes into the RemoteRepository
the (reasonable) defaults are available regardless of the (currently
obscure) initialisation routine, and make unit tests easier to develop
and support