_Chunk is a namedtuple of (meta, data), create chunks using Chunk(data, **meta).
This does not yet have any visible functionality, meta is always empty dict right now.
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.
borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
with Borg 1.x archives. (Or to experiment with chunker-params for
specific use cases
It is interrupt- and resumable.
Chunks are not freed on-the-fly.
Rationale:
Makes only sense when rechunkifying, but logic on which new chunks to
free what input chunks is complicated and *very* delicate.
Future TODOs:
- Refactor tests using py.test fixtures
-- would require porting ArchiverTestCase to py.test: many changes,
this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
-- with the target possibly in another Repo
(better than "cp" due to full integrity checking, and deduplication
at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks
Fixes#787#686#630#70 (and probably some I overlooked)
Also see #757 and #770
refactorings:
- introduced concept of default answer:
if the answer string is in the defaultish sequence, the return value of yes() will be the default.
e.g. if just pressing <enter> when asked on the console or if an empty string or "default" is
in the environment variable for overriding.
if an environment var has an invalid value and no retries are enabled: return default
if retries are enabled, next retry won't use the env var again, but either ask via input().
- simplify:
only one default - this should be a SAFE default as it is used in some special conditions
like EOF or invalid input with retries disallowed.
no isatty() magic, the "yes" shell command exists, so we could receive input even if it is not from a tty.
- clean:
separate retry flag from retry_msg
also: make check in Lock.close more precise, check for "is not None".
note: a lot of blocks were just indented to be under the "with" statement,
in one case a block had to be moved into a function.
this was also the loop contents of hashindex_merge, but we also need it callable from Cython/Python code.
this saves some cycles, esp. if the key is already present in the index.
due to borg's architecture, breaking the repo lock needs first creating a repository object.
this would usually try to get a lock and then block if there already is one.
thus I added a flag to open without trying to create a lock.
this was making us require mock, which is really a test component and
shouldn't be part of the runtime dependencies. furthermore, it was
making the imports and the code more brittle: it may have been
possible that, through an environment variable, backups could be
corrupted because mock libraries would be configured instead of real
once, which is a risk we shouldn't be taking.
finally, this was used only to build docs, which we will build and
commit to git by hand with a fully working borg when relevant.
see #384.
this is so that e.g. cron jobs do not hang indefinitely if yes() is called,
but it will just default to "no" if not tty is connected.
if you need to enforce a "yes" answer (which is not recommended for
the security critical questions), you can use the environment:
BORG_CHECK_I_KNOW_WHAT_I_AM_DOING=Y
subclasses of "Error": do not show traceback
(this is used when a failure is expected and has rather trivial reasons and usually
does not need debugging)
subclasses of "ErrorWithTraceback": show a traceback
(this is for severe and rather unexpected stuff, like consistency / corruption issues
or stuff that might need debugging)
I reviewed all the Error subclasses whether they fit into the one or other class.
Also: fixed docstring typo, docstring formatting
without this, there would be a solid 20 seconds here without any sort
of output on the console, regardless of the verbosity level. this
makes nice incremental messages telling the user that borg is not
stalled (or waiting for a lock, for that matter)
the "processing files" message is a little clunky, as we somewhat
abuse the cache to figure out if we are just starting... but it helps
if there are problems reading the actual files: it tells us the
initialization is basically complete and we're going ahead with the
reading of all the files.
this greatly simplifies the display of those objects, as the
__format__() parameter allows for arbitrary display of the internal
fields of both objects
this will allow us to display those summaries without having to pass a
label to the string representation. we can also print the objects
directly without formatting at all.
- issue #234: handle exception when config file is empty is really not a borg cache config
- there was a unused %s in the Exception string
- error msg was wrong when version check failed - this IS a borg cache, but not of expected version
the heuristics i used are the following:
1. if we are prompting the use, use print on stderr (input() may
produce some stuff on stdout, but it's outside the scope of this
patch). we do not want those prompts to end up on the standard
output in case we are piping stuff around
2. if the command is primarily producing output for the user on the
console (`list`, `info`, `help`), we simply print on the default
file descriptor.
3. everywhere else, we use the logging module with varying levels of
verbosity, as appropriate.
the logging level varies: most is logging.info(), in some place
logging.warning() or logging.error() are used when the condition is
clearly an error or warning. in other cases, we keep using print, but
force writing to sys.stderr, unless we interact with the user.
there were 77 calls to print before this commit, now there are 7, most
of which in the archiver module, which interacts directly with the
user. in one case there, we still use print() only because logging is
not setup properly yet during argument parsing.
it could be argued that commands like info or list should use print
directly, but we have converted them anyways, without ill effects on
the unit tests
unit tests still use print() in some places
this switches all informational output to stderr, which should help
with, if not fixjborg/attic#312 directly
added a check that compares the size of the new chunk with the stored size of the
already existing chunk in storage that has the same id_hash value.
raise an exception if there is a size mismatch.
this could happen if:
- the stored size is somehow incorrect (corruption or software bug)
- we found a hash collision for the id_hash (for sha256, this is very unlikely)