i checked it: copying the index.* and hints.* files in advance is not needed, open() and close() do not modify them.
also: fix unicode exception with encoded filename
because Repository.__init__ normally opens and locks the repo, and the upgrader just
inherited from (borg) Repository, it created a lock file there before the "backup copy"
was made.
No big problem, but a bit unclean.
Fixed it to not lock at the beginning, then make the copy, then lock.
For 0.29 we worked towards a "silent by default" behaviour, so interactive usage will include -v more frequently in future.
But I noticed that this conflicts with the progress display. This would be no problem if users willingly decide which one
of --verbose or --progress they want to see, but before this fix, the progress display was activated magically when
a tty was detected. So, to counteract this magic, users would need to use --no-progress.
That's backwards imho, so I removed the magic again and users have to give --progress when they want
to see a progress indicator. Or (alternatively) they give --verbose when they want to see the long file list.
From https://github.com/borgbackup/borg/pull/480 discussion:
Did you try 1024 (linux cache block size) or 4096 (internal sector size of bigger
hdds, also used in msgpack fallback.py as lower bound, see link)?
I've tested different values - 512 and 1024 are slightly better than 4096 in my case.
read_size = 1 ls -laR: 75.57 sec
read_size = 64 ls -laR: 27.81 sec
read_size = 512 ls -laR: 27.40 sec
read_size = 1024 ls -laR: 27.20 sec
read_size = 4096 ls -laR: 30.15 sec
read_size = 0 ls -laR: 442.96 sec (default)
OK, maybe we should go for 1024 then. That happens to be < MTU size, so in case someone works on NFS
(or other network FS) we will have less reads, less network packets, less latency.
Single-shot unpacker read buffer decreased from (default) 1Mb to 512b.
"ls -alR" on ~100k files backup mounted with fuse went from ~7min to 30 seconds.
as soon as one target segment is full, it is a good time to commit it and remove the source segments
that are already completely unused (because they were transferred int the target segment).
so, for compact_segments(save_space=True), the additional space needed should be about 1 segment size.
note: we can't just do that at the end of one source segment as this might create very small
target segments, which is not wanted.
removed --log-level due to overlap with how --verbose works now.
for consistency, added --info as alias to --verbose (as the effect is
setting INFO log level).
also added --debug which sets DEBUG log level.
note: there are no messages emitted at DEBUG level yet.
WARNING is the default (because we want mostly silent behaviour,
except if something serious happens), so we don't need --warning
as an option.
this was also the loop contents of hashindex_merge, but we also need it callable from Cython/Python code.
this saves some cycles, esp. if the key is already present in the index.
The read_msgpack and write_msgpack functions were only used in one place
each. Since msgpack is read and written in lots of places, having
functions with these generic names is confusing. Also, the helpers
module is quite a mess, so reducing its size seems to be a good idea.
the problem here was that we do not just have changed and unchanged items,
but also a lot of items besides regular files which we just back up "as is" without
determining whether they are changed or not. thus, we can't support changed/unchanged
in a way users would expect them to work.
the A/M/U status only applies to the data content of regular files (compared to the index).
for all items, we ALWAYS save the metadata, there is no changed / not changed detection there.
thus, I replaced this with a --filter option where you can just specify which
status chars you want to see listed in the output.
E.g. --filter AM will only show regular files with A(dded) or M(odified) state, but nothing else.
Not giving --filter defaults to showing all items no matter what status they have.
Output is emitted via logger at info level, so it won't show up except if the logger is at that level.
BUCKET_UPPER_LIMIT: 90% load degrades hash table performance severely,
so I lowered that to 75% (which is a usual value - java uses 75%, python uses 66%).
I chose the higher value of both because we also should not consume too much
memory, considering the RAM usage already is rather high.
MIN_BUCKETS: I can't explain why, but benchmarks showed that choosing 2^N as
table size severely degrades performance (by 3 orders of magnitude!). So a prime
start value improves this a lot, even if we later still use the grow-by-2x algorithm.
hashindex_resize: removed the hashindex_get() call as we already know that the values
come at key + key_size address.
hashindex_init: do not calloc X*Y elements of size 1, but rather X elements of size Y.
Makes the code simpler, not sure if it affects performance.
The tests needed fixing as the resulting hashtable blob is now of course different due
to the above changes, so its sha hash changed.
print_verbose is now simply logger.info() and is always displayed if
log level allows it. this affects only the `prune` and `mount`
commands which were the only users of the --verbose option. the
additional display is which archives are kept and pruned and a single
message when the fileystem is mounted.
files iteration in create and extract is now printed through a
separate function which will be later controled through a topical
flag.
due to borg's architecture, breaking the repo lock needs first creating a repository object.
this would usually try to get a lock and then block if there already is one.
thus I added a flag to open without trying to create a lock.
--progress isn't a "toggle" anymore, in that it will never disable progress information: always enable it.
example:
$ borg create ~/test/borg2::test$(date +%s) . ; echo ^shows progress
reading files cache
processing files
^shows progress
$ borg create ~/test/borg2::test$(date +%s) . < /dev/null; echo ^no progress
reading files cache
processing files
^no progress
$ borg create --progress ~/test/borg2::test$(date +%s) . < /dev/null; echo ^progress forced
reading files cache
processing files
^progress forced
$ borg create --no-progress ~/test/borg2::test$(date +%s) . ; echo ^no progress
reading files cache
processing files
^no progress
we introduce a ToggleAction that can be used for other options, but
right now is just slapped in there near the code, which isn't that
elegant. inspired by:
http://stackoverflow.com/questions/11507756/python-argparse-toggle-flags
note that this is supported out of the box by click:
http://click.pocoo.org/5/options/#boolean-flagsfixes#398
this was making us require mock, which is really a test component and
shouldn't be part of the runtime dependencies. furthermore, it was
making the imports and the code more brittle: it may have been
possible that, through an environment variable, backups could be
corrupted because mock libraries would be configured instead of real
once, which is a risk we shouldn't be taking.
finally, this was used only to build docs, which we will build and
commit to git by hand with a fully working borg when relevant.
see #384.
We also add --keep-tag-files to keep in the archive the root directory and the
tag/exclusion file in the archive.
This is taken from a attic PR (and adapted for borg):
commit f61e22cacc90e76e6c8f4b23677eee62c09e97ac
Author: Yuri D'Elia <yuri.delia@eurac.edu>
Date: Mon Dec 15 12:27:43 2014 +0100
Add a new --exclude-if-present command-line flag to ``borg create``. If
specified, directories containing the specified tag file will be excluded from
the backup. The flag can be repeated to ignore more than a single tag file,
irregardless of the contents.
This is taken from a attic PR (and adapted for borg):
commit 3462a9ca90388dc5d8b4fa4218a32769676b3623
Author: Yuri D'Elia <yuri.delia@eurac.edu>
Date: Sun Dec 7 19:15:17 2014 +0100
- can create 0-byte files now
- frees space early (avoids running out of disk space at repo init time)
- creates multiple reserve files, so we do not only reserve some space,
but also some inodes
- only print output if there is an error RC
- if make_files makes us run out of space, that is not interesting, just start
a new iteration from scratch
added try/finally (the code in between was just indented, no
other code changes) to make sure it sets self.index back to None,
even if the code crashes e.g. due to an IntegrityError caused
by an incomplete segment caused by a disk full condition.
also, in prepare_txn, create an empty in-memory index if transaction_id
is None, which is required by the Repository.check code to work correctly.
If the index is not empty there, it will miscalculate segment usage
(self.segments).
this is so that e.g. cron jobs do not hang indefinitely if yes() is called,
but it will just default to "no" if not tty is connected.
if you need to enforce a "yes" answer (which is not recommended for
the security critical questions), you can use the environment:
BORG_CHECK_I_KNOW_WHAT_I_AM_DOING=Y
this is needed for tools like borgweb (or in general: when the rc value / exit status should
be logged for later review or directly seen on screen).
this is off by default, so the output is less verbose (and also does not fail tests which
counts lines).
parse_args concentrates on only processing arguments, including pre and post processing.
this needs to be called before run(), which is now receiving the return value of parse_args.
this was done so we can have the parsed args outside of the run function, e.g. in main().
subclasses of "Error": do not show traceback
(this is used when a failure is expected and has rather trivial reasons and usually
does not need debugging)
subclasses of "ErrorWithTraceback": show a traceback
(this is for severe and rather unexpected stuff, like consistency / corruption issues
or stuff that might need debugging)
I reviewed all the Error subclasses whether they fit into the one or other class.
Also: fixed docstring typo, docstring formatting
the reason for a slow msgpack can be:
- pip install: missing compiler (gcc)
- pip install: missing compiler parts (e.g. gcc-c++)
- pip install: cached wheel package that was built while the compiler wasn't present
- distribution package: badly built msgpack package
there were 2 issues:
lock used another pid and tid than release because daemonize() made it different processes/threads.
solved by just determining PID only once and not using TID any more.
the other issue was that the repo needed and explicit closing.
just to avoid rounding / precision issues with floating point computations on py < 3.3
I used 2 hardcoded "full second" values on the input file and check if they get restored
correctly.
All normal informational output is now logged at INFO level.
To actually see normal output, the logger is configured to level INFO also.
Log levels:
WARNING is for warnings, ERROR is for (fatal) errors, DEBUG is for debugging.
logging levels must only be used according to this semantics and must not be
(ab)used to let something show up (although the logger would usually hide it)
or hide something (although the logger would usually show it).
Controlling the amount of output shown on INFO level:
--verbose, --progress, --stats are currently used for this.
more such flags might be added later as needed.
if they are set, more output is logged (at INFO level).
also: change strange setup_logging return value
this way the version can be discovered by scripts without having to
part the output of 'help'.
it is removed from the 'help' output itself because it is prettier
without the complete version number, and then the description can be
reused elsewhere as well without needing the version number
0 = success (logged as INFO)
1 = warning (logged as WARNING)
2 = (fatal and abrupt) error (logged as ERROR)
please use the EXIT_(SUCCESS,WARNING,ERROR) constants from helpers module.
those can now support both file sizes (in SI/decimal format, powers of 10) and memory sizes (in binary format, powers of 2)
tests still fail because the result is always displayed as floats