If `borg compact` was interrupted after deleting repository objects but
before writing the updated chunk index, the still-existing cache/chunks.*
kept claiming the deleted objects were present. A later `borg create` would
trust that stale index, skip re-uploading the affected chunks and silently
create an archive with dangling object references that extracts to zero bytes.
Invalidate all cached chunk indexes via delete_chunkindex_cache() before the
first object is deleted, so an interruption is conservative: the next client
rebuilds the index from actual repository contents and re-uploads any deleted
data. The post-deletion save_chunk_index() still writes a fresh, valid index.
Add a regression test covering both compact paths (default and --stats) that
interrupts compaction right before save_chunk_index() and verifies no cached
chunk index survives and a later create+extract reproduces the original bytes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
FilesCacheMixin initialized _newest_cmtime to 0, but _write_files_cache()
only treats None as "no file was chunked this run" (falling back to a
max_time_ns cutoff that keeps all current entries).
When a backup reuses all files from the cache without chunking anything,
_newest_cmtime stayed at 0, so the race-protection cutoff became the unix
epoch and every current (age == 0) entry was discarded. The next backup
then had to re-read, chunk and hash all files again.
Initialize _newest_cmtime to None to match the documented contract in
_write_files_cache(), and make the comparisons in _build_files_cache()
None-safe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In modern borg these were just a pass-through repository wrapper (there
is no RepositoryCache), with one variant doing inline decryption and
returning (csize, plaintext) tuples. Drop both and make all consumers
use the raw repository directly:
- fuse.py: ItemCache / FuseOperations / FuseBackend now take the raw
repository + repo_objs and decrypt via repo_objs.parse(ROBJ_DONTCARE),
matching hlfuse.py. The csize value was discarded at both call sites.
- mount_cmds.py: drop the cache_if_remote wrapper around FuseOperations.
- archive.py (rebuild_archives / check): drop the pass-through wrapper;
robust_iterator now uses self.repository directly.
- repository.py: delete the RepositoryNoCache class and cache_if_remote.
- repository_test.py: remove TestCacheIfRemote and orphaned imports.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Preloading was used only by extract and export-tar, and the modern
borgstore-based Repository.preload() was already a no-op. Remove the
preload calls from both commands and the now-dead supporting code:
- DownloadPipeline.preload_item_chunks / Archive.preload_item_chunks
- hlids_preloaded tracking
- is_preloaded parameter from DownloadPipeline.fetch_many and
Repository.get_many
- the no-op Repository.preload()
Also: remove preload support from borg.legacy
With no remaining callers, drop the legacy-side preload machinery:
LegacyRemoteRepository:
- preload_ids / chunkid_to_msgids state and the pop_preload_msgid helper
- is_preloaded parameter and handling in call_many() (get requests now
always go through the normal send path; MAX_INFLIGHT pipelining of
regular calls is unchanged)
- is_preloaded from get_many() and the preload() method
LegacyRepository:
- is_preloaded from get_many() and the no-op preload() stub
Both legacy repo classes now match the modern Repository interface.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On omniOS the test data now lives on ZFS (via TMPDIR=/var/tmp), and ZFS
rejects the year-2261 os.utime() with EOVERFLOW. Treat that as an
unsupported-filesystem condition and skip, rather than failing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
With TMPDIR=/var/tmp/borg-ci on omniOS, platformdirs' user_runtime_dir
fallback yields /var/tmp/borg-ci/runtime-<uid>/borg. Add it to the
accepted values, like the existing NetBSD CI entry.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On omniOS /tmp is swap-backed tmpfs (small, RAM-bound), so the pip/cargo
build temps and the pytest temp tree quickly exhaust it ("no space left on
device"). Point TMPDIR at disk-backed /var/tmp instead, mirroring what the
NetBSD job already does.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fix PackWriter.flush() to use max_count == 1 (not len == 1) for the pack_id hack,
so final partial packs under max_count > 1 correctly use SHA256. Add covering test.
Move sha256 import to module level in repository_test.
PackWriter buffers (chunk_id, cdata) pairs and flushes as pack files via borgstore.
At N=1 pack_id == chunk_id; UNKNOWN_INT32 (0xFFFFFFFF) placeholders in the index
are replaced by real pack location fields after flush() via update_pack_info().
Update test_chunkindex_add to expect UNKNOWN_INT32 sentinels from add().
Enable the pdf output format on Read the Docs (the LaTeX build config
already existed in docs/conf.py) and add a "Downloads" line to the left
sidebar that links the offline formats (PDF, HTML zip, ePub). The links
are populated from the Read the Docs addons data, reusing the same
mechanism as the version selector, so they are version-correct and hidden
when unavailable. The line is left-aligned with the boxes above and the
table of contents below, with separators above and below it.
Also drop the stale 'resources' entry from latex_appendices (the page was
removed in #2088); it broke the now-enabled PDF build with a doctree
KeyError.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RepositoryServer stored self.permissions but never read it: open() builds
a LegacyRepository without any permissions, and legacy (borg 1.x / v1)
repositories have no permission system at all. Remove the dead __init__
parameter and stop forwarding args.permissions from do_serve.
The --permissions CLI option stays - it applies to the non-legacy
"borg serve --rest" path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remote access is restricted via an SSH forced command in authorized_keys
that hardcodes the restriction, e.g.
command="borg serve --rest --restrict-to-path=/srv/repos"
get_args() merges the forced command with the client's intended command
(SSH_ORIGINAL_COMMAND), copying only allowlisted options from the client.
For legacy serve the repo path travels inside the RPC protocol, so the
server enforces restrictions against it. But a rest:// repo passes the
repo as "--backend FILE:<path>" on the command line, and "backend" was in
neither allow- nor denylist, so under a forced command the client's
--backend was dropped: args.backend ended up None and do_serve_rest failed
with "requires --backend" - restrictions for rest were effectively broken.
Add "backend" to the allowlist so the client chooses which repo while the
forced command pins the restriction and the rest mode; do_serve_rest then
validates the client backend against restrict_to_paths/repositories via
check_rest_restrictions. The --rest mode flag stays out of the allowlist
so a forced legacy serve cannot be flipped to rest by the client.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PathNotAllowed lived in borg.legacy.remote, but borg serve --rest
(non-legacy) now also raises it via check_rest_restrictions, which made
non-legacy code import from the legacy package just for an exception.
It is a generic "repository path not allowed" error, so move it next to
the other cross-cutting Error subclasses in helpers/errors.py and
re-export it from helpers. Pure relocation; exit code stays 83.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A rest:// repository is now served by "borg serve --rest" spawned over ssh
rather than borgstore's "borgstore-server-rest".
CI: chmod o+x $HOME so the rest test's ssh user (sftpuser) can run borg
The rest repo test starts "borg serve --rest" over ssh as sftpuser, which runs
the borg under test from the tox venv under the runner $HOME.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make `borg serve` able to be the server-side component of a rest:// repository,
selected with a new --rest option. Plain `borg serve` (no option) keeps serving
legacy borg-1.x repos and stays command-line compatible with borg 1.x.
- serve_cmd.py: add --rest and --backend. With --rest, serve the given
--backend FILE:<path> on stdio via borgstore.server.rest.serve(); honor
--restrict-to-path/--restrict-to-repository (validated against the FILE path)
and --permissions (mapped via borg_permissions). Without --rest, run the legacy
RepositoryServer as before.
- repository.py: for rest:// locations, build the borgstore REST backend with a
command that runs `borg serve --rest --backend FILE:<path>` (locally via
sys.executable, or over ssh reusing borgstore's ssh_cmd / BORG_REMOTE_PATH),
instead of borgstore's hardcoded `borgstore-server-rest`. So a remote only needs
borg installed. Extracted the permissions string->dict mapping into the reusable
borg_permissions().
- tests: unit tests for the rest serve command builder. The existing
remote_archiver (rest:///) suite now runs against `borg serve --rest`.
- docs: changelog + quickstart updated.
Legacy serve and the legacy ssh client are unchanged (client still spawns plain
`borg serve`).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The existing --from-borg1 transfer tests only use a local v1 repo, so they
exercise LegacyRepository but never the ssh path (LegacyRemoteRepository +
borg serve / RepositoryServer) that this branch preserves.
Add test_transfer_from_borg1_ssh: extract the repo12.tar.gz borg 1.2 repo and
transfer from it via --other-repo=ssh://__testsuite__/<abspath> --from-borg1.
The __testsuite__ host makes the legacy client spawn a local "borg serve"
(no real ssh), driving the full client -> serve -> LegacyRepository chain, then
asserts all archives transferred. Local/non-win32 only, like the sibling tests.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
borg.remote no longer fit its name: it held the legacy-only borg serve server
plus generic repository cache wrappers used by current repos. Split by purpose
and remove the module:
- Move RepositoryServer into borg.legacy.remote (it only serves legacy v1 ssh
repositories). It reuses the exception classes (PathNotAllowed,
InvalidRPCMethod, UnexpectedRPCDataFormatFromClient) and BORG_VERSION / MSGID
constants already defined there; open() uses the module-level LegacyRepository.
serve_cmd.py now imports RepositoryServer from ..legacy.remote.
- Move RepositoryNoCache and cache_if_remote into borg.repository (they wrap a
Repository and are used by Archive.check and mount of current repos).
archive.py and mount_cmds.py import them from ..repository now.
- Move the cache_if_remote tests into repository_test.py; delete remote_test.py.
- Delete src/borg/remote.py; fix the stale BUFSIZE comment in constants.py.
Pure relocation, no behavior change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
borg.legacy.remote.cache_if_remote (and the RepositoryCache / RepositoryNoCache
classes it returns) are dead code: nothing imports or calls them. Every
cache_if_remote consumer (archive check, mount, tests) uses the non-legacy
borg.remote version, and legacy repos never reach it (Archive.check rejects
legacy repos). The trio was copied wholesale during the borg.legacy split (#9556).
Delete RepositoryNoCache, RepositoryCache and cache_if_remote, plus the imports
that only they used (shutil, struct, tempfile, xxhash.xxh64, compress.Compressor,
helpers.safe_unlink). LegacyRemoteRepository and the rest of the module are
unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After removing the modern RemoteRepository, cache_if_remote always returned
RepositoryNoCache in production (the only RepositoryCache path was the removed
isinstance(RemoteRepository) check; force_cache=True was used only by a test).
Delete the vestigial RepositoryCache class and simplify cache_if_remote: drop
the pack/unpack/force_cache parameters and the LZ4/xxh64 cache-file machinery,
keep building the decrypted_cache -> transform closure, and always return
RepositoryNoCache. Remove the imports that only RepositoryCache used.
Replace the RepositoryCache tests with a focused test of the surviving
cache_if_remote path (plain passthrough and decrypted (csize, plaintext) tuples).
The legacy copy in borg/legacy/remote.py is intentionally left untouched (its
RepositoryCache is still used for LegacyRemoteRepository).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The modern client/server transport (RemoteRepository served by `borg serve`
over an msgpack RPC protocol) is now redundant for current (borg 2) repos:
its functionality is replaced by rest:// (which can tunnel over ssh to a
remote borgstore REST server).
Remove the modern RemoteRepository (both ssh:// and socket://) entirely.
Legacy v1 (borg 1.x) repos remain reachable over ssh:// via the separate
LegacyRemoteRepository client, and `borg serve` / RepositoryServer is kept,
trimmed to the legacy-only path, so a remote borg2 can still serve a v1 repo
for `borg transfer --from-borg1`.
Details:
- remote.py: delete RemoteRepository, SleepingBandwidthLimiter and the `api`
decorator; trim RepositoryServer to legacy-only (drop modern _rpc_methods,
socket serving, non-legacy open() branch); keep cache_if_remote /
RepositoryCache / RepositoryNoCache (used by all repos).
- get_repository(): non-legacy ssh:// now raises a clear "use rest://" error;
socket:// route and the global --socket option removed.
- parseformat: drop the socket:// scheme (now an invalid location).
- borg serve: keep the command (serves legacy v1 ssh only); update epilog.
- borg version: drop modern remote query; keep legacy ssh path.
- update isinstance/import sites (cache, archive, fuse/hlfuse, analyze/compact,
archiver __init__ -> LegacyRemoteRepository.RPCError).
- tests/docs updated; obsolete socket serve test removed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BLAKE3 is generally faster and provides a more modern construction for
keyed hashing (using its internal keyed mode instead of the construction
used for BLAKE2b).
Key types changed:
- authenticated-blake2 -> authenticated-blake3
- {keyfile,repokey}-blake2-aes-ocb -> {keyfile,repokey}-blake3-aes-ocb
- {keyfile,repokey}-blake2-chacha20-poly1305 -> {keyfile,repokey}-blake3-chacha20-poly1305
This also fixes the slightly unusual way how we used blake2b,
it is only supported for importing borg 1.x repos.
New repos either use HMAC-SHA256 or BLAKE3.
working with r1beta5 (from 2024) is just too much pain.
the system packages only have python 3.10.
if one install python 3.11 from HaikuPorts, it has no ssl support.
if one also installs openssl3 from HaikuPorts, creating a venv fails...
also: rust toolchains issues, thread-local storage ("TLS") issues, as seen in #9463.
thus: no haiku CI until they release next beta and cross-platform-actions have it.
RepoObj.extract_crypted_data() / parse_meta() / parse() unpacked the
fixed-size object header without first checking the input length, and
guarded the meta/data sizes only with assert. A too-short object (e.g. a
truncated or malicious repository object) therefore raised an uncaught
struct.error, and a header claiming more meta/data than present raised
AssertionError.
Callers handle repository corruption by catching IntegrityError, so these
unintended exception types escaped that handling and aborted the operation
with a traceback instead of a clean "corrupted object" report.
Validate the input length before unpacking the header and turn the size
consistency checks into IntegrityError.
Add regression tests for too-short objects and inconsistent meta/data sizes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add three new fields to ChunkIndexEntry and update all call sites:
- pack_id (32 bytes): identifies the pack file containing the chunk
- obj_offset (uint32): byte offset of the chunk within the pack
- obj_size (uint32): stored (compressed) size of the chunk on disk
At N=1 (one chunk per pack), chunk_id == pack_id, obj_offset == 0,
and obj_size == pack_size. All sites use chunk_id as the ChunkIndex
key and extract pack_id as a separate variable with an N=1 comment.
compact_cmd.py: use obj_size (stored size) in repository_size sum.
cache.py: preserve pack fields when serializing the chunk index cache.
repository.py: populate pack_id/obj_size from borgstore object info.
archive.py: extract pack_id on its own line, obj_size=0 for now.
hashindex.pyx: update add(), namedtuple, format string, and docstring.
hashindex.pyi: add new fields to ChunkIndexEntry and CIE type alias.
testsuite/hashindex_test.py: update all ChunkIndexEntry constructions.
- Add an epilog to the main ArgumentParser in src/borg/archiver/__init__.py.
- Import process_epilog and use it to format and list additional help topics: patterns, match-archives, placeholders, compression.
- Add test_main_help_epilog to help_cmd_test.py.
- Fixes#3432
- Add --from-borg1 option to borg repo-list command.
- Add allow_v1 argument to @with_repository decorator.
- If allow_v1 is True and v1_legacy is requested, allow version 1 repositories and load Manifest with RepoObj1 object class.
- Support LegacyRemoteRepository in Manifest class.
- Add test_repo_list_from_borg1.
info.size is the on-disk pack file size, which equals the chunk size only
when N=1 (one chunk per pack). Extract it into a named variable with a
comment so the assumption is visible and easy to fix when N>1 is introduced.
Corruption at offset 123 lands inside meta_encrypted (header is 49 bytes),
causing extract_crypted_data to return a shifted slice whose first byte is
a random AES-OCB ciphertext byte. When that byte equals 0x02 (PlaintextKey
type) key detection silently selects the wrong key, leading to a flaky
IntegrityError in rebuild_archives.
Move the insertion point to offset 250, which is safely inside data_encrypted
for any realistic manifest size, so key detection always reads the correct
type byte and the corruption is caught by AEAD authentication instead.
_common.py had a hard-coded version check that only allowed v3.
Now that repository.py creates v4 repos, every archiver command
failed to open the repo. Extend the guard to (3, 4).
The --other-repo check (v1 or v3 for borg transfer source) is
intentionally left unchanged.
Wrap each pack file in a 13-byte header (magic + version + blob_len) so
packs are self-identifying and the [len][blob] unit extends to N>1 without
a format revision. Bump version 3->4: packs/ and 49-byte ObjHeader are
incompatible with version-3 readers. Fix test_extra_chunks chunk_id mismatch.
Introduces pack_id as the borgstore storage key (N=1: pack_id == chunk_id).
Chunks move from data/ to packs/ with single-level directory sharding (256
subdirs). check_object() validates the header chunk_id against the pack
filename. Adds packs/ to ns_config with levels=[1] and to the permissions
maps for no-delete and write-only modes.
Stores chunk_id unencrypted in the per-blob header so borg check can
rebuild the chunk_id -> pack location index without decryption. AEAD
uses chunk_id as additional data, making key-free recovery circular
without an explicit plaintext copy.
Header layout: OBJ_MAGIC(8) + version(1) + chunk_id(32) + meta_size(4)
+ data_size(4) = REPOOBJ_HEADER_SIZE = 49 bytes.
- always have a starting line with FILE_ID repoid
- store repkeys content-addressed, name is sha256(content)
- search by repo id on load
- add keyfile_format / keyfile_parse / is_keyfile helpers
When borg auto-selects the key file path (no BORG_KEY_FILE override),
the keyfile is now named sha256(file_contents).hexdigest() instead of
a location-derived name with optional .2/.3/... collision suffixes.
- On key change-passphrase, the old keyfile is securely erased and the
new file is written under its new hash-based name.
- borg key import likewise uses the hash-based name.
- Existing legacy-named keyfiles continue to work: _find_key_in_keys_dir
scans by file content, not filename.
- BORG_KEY_FILE still honors the explicit path verbatim.
Removes test_repo_create_keyfile_same_path_creates_new_keys which
tested the now-removed .N collision-suffix behavior.
SHA-256-named pack files make per-blob xxh64 verification redundant.
AEAD decryption already catches corruption on the client side.
Header shrinks from 24 to 8 bytes per object.
Use BORG_OBJ (8 bytes) as the blob magic and refer to it as OBJ_MAGIC
throughout so the literal and its length appear in one place. Update
the inline blob diagram to the pipe-separated notation Thomas suggested.
Rename PackIndex->ChunkIndex, fix pack path to use pack_id, drop
levels_config detail, fix "keyed MAC"->"ID hash" in Pack ID section,
document chunk_id duplication across unencrypted header and encrypted_meta.
The v2 repo format was only used by early borg2 alphas/betas,
before borgstore was introduced.
For borg transfer, we only need to support reading v1 repos
of borg 1.x.
remove forward-looking N>1 references, hardcoded offsets, and stale "currently";
use borgstore vocabulary, medium-sized index files, and simplified recovery prose
for packs, this needs to get implemented differently to perform well.
processing needs to be pack-after-pack and the index needs to be
updated correctly and carefully, e.g. considering interruptions
of repo-compress.
- `src/borg/__init__.py`: The `setuptools_scm` fallback version `0.1.dev1`
was incorrectly bypassing the assertion check (as `0` and `1` are valid
integers), which hid the intended helpful error message when building
from source without tags. Added an explicit check for the `0.1.dev` prefix.
- `src/borg/version.py`: `parse_version` and `format_version` have been
updated to correctly understand `setuptools_scm`'s `.dev` or `dev` prefixes.
These dev releases are now properly encoded in the version tuple as `-9`
(which logically makes them older than alphas `-4`, betas `-3`, rcs `-2`,
and final `-1`), and correctly reformatted to `.dev` strings.
(cherry picked from commit 232ccabfa3)
... or rather that the slashdot hack doesn't impact pattern matching at all. Add a note to `borg help patterns`.
(cherry picked from commit ae1440ed7e)
Include all changes from dependabot PR #9603 plus fix the broken
virtualenv pin: tox 4.52.1 requires virtualenv>=21.1, but PR #9603
kept virtualenv==20.39.1 (20.x). Bumped virtualenv to 21.3.2.
Fixes: https://github.com/borgbackup/borg/pull/9603
When using the slashdot hack (e.g. `borg create ARCHIVE rootfs/./`),
the source directory's metadata was being excluded instead of archived as
the archive root. This happened because `create_helper` treated the
slashdot target directory the same as its parent directories (which should
be stripped), rather than recognizing it as the root of the archive.
Added a new condition in `create_helper` to detect when the current path
matches the strip prefix target exactly (`path + "/" == strip_prefix`) and
archive it as `"."` (the archive root) instead of excluding it.
The @pytest.mark.skipif(not fs_supports_sparse(), ...) decorator on
test_chunkify_sparse was commented out and is not needed because the
zeros.startswith(result) fix in FileReader.read() detects zero-filled
slices as CH_ALLOC regardless of sparse FS support, and ChunkerFixed
with sparse=True gracefully falls back when SEEK_HOLE/SEEK_DATA is
not available.
The test_sparsemap tests were failing on Linux CI because SEEK_HOLE/
SEEK_DATA naturally coalesces adjacent ranges of the same type (data
or hole), but the tests compared against the raw per-block sparse maps
which list each block separately.
Add a coalesce_sparse_map() helper that merges adjacent ranges with
the same is_data flag, and compare sparsemap() output against the
coalesced expected map instead of the raw per-block map.
When FileReader.read() sliced a large CH_DATA block (read at 1MB
granularity) into smaller block_size chunks (e.g. 4096 bytes), zero-filled
slices were returned as CH_DATA with zero bytes instead of CH_ALLOC.
Add a zeros.startswith(result) check before returning a CH_DATA chunk,
converting all-zero slices to CH_ALLOC. This ensures sparse-aware
consumers correctly identify allocated-but-zero regions regardless of
whether the file was read with sparse=True or sparse=False.
- Update fixed_test.py expectations for non-sparse chunking.
- Enable `sparse=True` in interaction_test.py and reader_test.py where zero detection is required.
- Catch `ValueError` in _build_fmap to support `BytesIO` seeking.
When a generator for get_many() or call_many() is destroyed early (for example, if a BackupError occurs during extraction and aborts fetching preloaded chunks), a GeneratorExit is raised inside call_many().
Previously, call_many() lacked a try/finally block, so it failed to mark the abandoned msgids in self.ignore_responses. When the remote server eventually sent the data, it was indefinitely cached in self.responses and self.chunkid_to_msgids, causing a memory leak.
This fix wraps the request loop in try/finally to guarantee that all pending waiting_for message IDs, as well as any unrequested preloaded chunk IDs in calls, are properly added to ignore_responses.
For example, this memory leak could be triggered when extracting files:
- by permission errors or other OSErrors with the extracted file
- if the archived file had all-zero replacement chunks or inconsistent size
The previous code performed allocations and buffer acquisitions before the
`try` block. If a later allocation or buffer acquisition failed, execution did
not enter the `finally` block, so resources acquired earlier in the setup path
could leak.
Move allocation and buffer acquisition into the guarded block, initialize raw
output pointers to `NULL`, and only call `PyMem_Free` or `PyBuffer_Release`
for resources that were actually acquired.
The previous code performed allocations and buffer acquisitions before the
`try` block. If a later allocation or buffer acquisition failed, execution did
not enter the `finally` block, so resources acquired earlier in the setup path
could leak.
Move allocation and buffer acquisition into the guarded block, initialize raw
output pointers to `NULL`, and only call `PyMem_Free` or `PyBuffer_Release`
for resources that were actually acquired.
The `test_extract_restores_append_flag` test leaves append-only
tempfiles around on macOS and FreeBSD that cannot be removed cleanly,
this was previously just ignored by the cleanup func but those files
occasionally caused lots of warning output on subsequent test runs.
Fixed by attempting to clear flags and retry whenever the cleanup
function fails.
The move to platformdirs and its current usage _does_ honor XDG_*
variables on macOS if they are set. Tests were set up to assume this to
be untrue and the docs matched that.
This commit adds tests asserting that XDG_* variables are used when they
are present on macOS, with default locations still in ~/Library.
prune: add --json option, fixes#9222
Enable programmatic extraction of prune/keep decisions via
structured JSON output, instead of parsing log message text.
Follows the repo-list --json pattern: outputs a single JSON object
with repository, encryption, and archives array. Each archive
includes pruned (bool), rule, and rule_number fields.
This adds a runtime warning when running under MSYS2/Git Bash without the necessary environment variables to disable automatic path translation. The documentation is also updated to explain this behavior and how to mitigate it.
- add `--tags TAG [TAG ...]` option to `borg create` to tag newly created archives.
- validate the tags exactly like `borg tag` does, including checking that any special tags starting with `@` are known `SPECIAL_TAGS`.
- add `test_create_tags` and `test_create_invalid_tags` to ensure proper behavior.
- Added `--hostname` and `--username` command-line options to `borg create`
- Updated Archive to capture and store these explicit values, falling back to system defaults
- Added `test_explicit_hostname_and_username` to verify the functionality
test_with_lock previously relied on a hardcoded timeout
(`time.sleep(4)`) to ensure the first background command acquired
the repository lock before the second command tried to get it. On
extremely slow CI runners, this was sometimes too short, allowing
the second command to falsely acquire the lock.
This commit replaces the arbitrary sleep with true synchronization:
- The first command now blocks indefinitely using `sys.stdin.read()`.
- The test deterministically waits for lock acquisition by reading
`p1.stdout.readline()` which guarantees the lock is held.
- After the second command correctly fails, the first command is
smoothly unblocked and terminated by passing `input=""` to
`p1.communicate()`.
Python's `os.truncate()` on Windows relies on `SetEndOfFile()`, which does
not initialize the extended disk space with zeroes. This means that
trailing sparse holes simply leave uninitialized garbage data at the end
of the file.
During sparse file extraction, when the very last chunk is a sparse hole,
the VDL (Valid Data Length) is not properly advanced by `os.truncate()`.
As a result, reading from the end of the file fetches random disk garbage
instead of zeroes, causing spurious test failures at boundaries (like
2MB or 8MB) depending on what was in the uninitialized disk sectors.
Fix this by tracking trailing holes and manually writing a single `b"\0"`
byte at the end of the file before truncating on Windows. Writing explicit
data forces NTFS to officially advance the VDL and securely zero-fill the
preceding hole space.
Re-enable `test_sparse_file` on Windows.
This adds the `--paths-from-shell-command` option to the `create` command, enabling the use of shell-specific features like pipes and redirection when specifying input paths. Includes related test coverage.
Borg's ArgumentParser (in borg.helpers.argparsing) now subclasses
jsonargparse's ArgumentParser and pre-sets two defaults that every
borg parser uses:
formatter_class = RawDescriptionHelpFormatter
add_help = False
The old code worked around argparse's flat namespace by appending
_maincommand / _midcommand / _subcommand suffixes to every common
option's dest (e.g. log_level_subcommand), then resolving them with
CommonOptions.resolve() after parsing. This polluted config key names
and env var names (BORG_LOG_LEVEL_SUBCOMMAND instead of BORG_LOG_LEVEL).
jsonargparse nests subcommand arguments automatically, so the workaround
is no longer needed. Each parser level now registers common options with
their clean dest name. flatten_namespace() is updated to a two-pass
depth-first walk so the most-specific (innermost) value wins naturally:
borg --info create --debug → log_level = "debug" (subcommand wins)
borg --info create → log_level = "info" (top-level fills gap)
For append-action options (--debug-topic) values from all levels are
merged (outer + inner) to preserve the accumulation behaviour.
Previously, ArgparsePatternAction and ArgparsePatternFileAction
appended recursion roots directly to args.paths. This mixed
CLI positional paths with paths derived from patterns
(e.g., using the `R` root path command in a pattern file),
complicating downstream argument parsing and future jsonargparse
integration.
This commit introduces `args.pattern_roots` as a dedicated list
for these accumulated root paths:
- All argparse definition sites now initialize `pattern_roots=[]` alongside `paths=[]`
- ArgparsePatternAction and ArgparsePatternFileAction write directly to `args.pattern_roots`
- The build_matcher utility accepts both `include_paths` and `pattern_roots` and concatenates them internally
- `create_cmd` iterations explicitly concatenate both lists before processing
This ensures `args.paths` strictly reflects exactly what the
user provided positionally, paving the way for a clean
jsonargparse implementation without regressions in pattern behavior.
there are a lot of files in src_dir (due to the __pycache__ subdir).
for tests that do not need that, we can use a much smaller set of files,
now provided by the backup_files fixture.
Fixes#9448.
borg mount forks into a background daemon, so coverage was missing the process that actually handles the FUSE requests, leaving fuse.py and hlfuse.py at 0%.
Enable coverage patches:
patch = ["subprocess", "_exit"]
This lets coverage follow spawned subprocesses and still record data for paths that terminate via os._exit().
moved --junitxml parameter to the tox configuration.
haiku: add coverage params to pytest invocation (tox not used there).
vm_tests: add test_results and coverage uploads.
hard-code coverage.xml as coverage filename
"*/borg/fuse.py" - suspect, let's try what happens if we do not omit.
"*/borg/support/*" - directory does not exist anymore.
"*/borg/hash_sizes.py" - does not exist anymore.
That way we should get macOS coverage visible in PRs.
The macOS (intel) job will only run less frequently on merges (likely this runs on older hardware at github).
time: the nominal ts, used for prune, list, sorting, ...
start: operation start time (informative)
end: operation end time (informative)
Often, "time" is the same as "start" (normal borg create).
But it can make sense to have a different "time":
- borg create --timestamp=...
- borg recreate --timestamp=...
- borg recreate (will keep "time" as in original archive)
- borg transfer (will keep "time" as in original archive)
recreate and transfer produce new archives, "start" and "end"
will reflect the recreate/transfer operation.
Also: remove start_monotonic. start and end are just what the
clock shows (including tz), so should be ok to compute duration
from that, even for dst switching times.
Not as bad as it sounds:
32bit platforms with 64bit time_t will still work.
As of 2026, this is pretty much any platform that can run borg reasonably well.
Add utcfromtimestampns() helper that converts nanosecond timestamps to
datetime objects using integer arithmetic (timedelta) instead of floating
point division. This avoids precision loss and potential overflow on 32bit
platforms with old glibc.
Use it in safe_timestamp() and timestamp() instead of datetime.fromtimestamp().
Needed to tweak the timestamps in repo12.tar.gz/test_meta/*.json +/- 1us.
That way, right below the docs version number that is currently
being displayed, it is easier to find for users.
Also: hide the default readthedocs-flyout (bottom right)
Co-authored-by: Junie <junie@jetbrains.com>
Co-authored-by: Junie <junie@jetbrains.com>
similar to what borg delete does.
also:
- remove "uncommitted" counter, we do not use commits anymore
- always call manifest.write() if we deleted something
here it was not a problem currently, because format_archive(archive_info) does not load the archive from the repo, but only uses the given archive_info contents.
format_item() can trigger lazy loading of archive metadata (e.g. hostname,
username, size) from the repository. Previously it was called after
archive.delete(), which caused Archive.DoesNotExist for pruned archives.
Fix: call formatter.format_item() early, before any deletion takes place.
Also added a test.
The efficiency difference between `meta.extend(bytes(N))` and `meta = meta + bytes(N)` stems from how Python manages memory and objects during these operations.
- **`bytearray.extend()`**: This is an **in-place** operation. If the current memory block allocated for the `bytearray` has enough extra capacity (pre-allocated space), Python simply writes the new bytes into that space and updates the length. If it needs more space, it uses `realloc()`, which can often expand the existing memory block without moving the entire data set to a new location.
- **Concatenation (`+`)**: This creates a **completely new** `bytearray` object. It allocates a new memory block large enough to hold the sum of both parts, copies the contents of `meta`, copies the contents of `bytes(N)`, and then reassigns the variable `meta` to this new object.
- **`bytearray.extend()`**: In the best case (when capacity exists), it is **O(K)**, where K is the number of bytes being added. In the worst case (reallocation), it is **O(N + K)**, but Python uses an over-allocation strategy (growth factor) that amortizes this cost, making it significantly faster on average.
- **Concatenation (`+`)**: It is always **O(N + K)** because it must copy the existing `N` bytes every single time. As the `bytearray` grows larger (e.g., millions of items in a backup), this leads to **O(N²)** total time complexity across multiple additions, as you are repeatedly copying an ever-growing buffer.
- Concatenation briefly requires memory for **both** the old buffer and the new buffer simultaneously before the old one is garbage collected. This increases the peak memory usage of the process.
- `extend()` is more memory-efficient as it minimizes the need for multiple large allocations and relies on the underlying memory manager's ability to resize buffers efficiently.
In the context of `borg mount`, where `meta` can grow to be many megabytes or even gigabytes for very large repositories, using concatenation causes a noticeable slowdown as the number of archives or files increases, whereas `extend()` remains performant.
Add _BORG_BENCHMARK_CPU_TEST environment variable (following the existing
pattern of _BORG_BENCHMARK_CRUD_TEST) that reduces:
- timeit iterations from 100 to 1 (10 to 1 for compression)
- KDF iterations from 5 to 1
- random data buffer from 10MB to 100KB
Set this env var in test_benchmark_cpu and test_benchmark_cpu_json so
they complete quickly in CI while still exercising the full code path.
Fixes#9414
Signed-off-by: edvatar <88481784+toroleapinc@users.noreply.github.com>
guess the assert was meant to make sure that we do not have backslashes as path separators, but did not consider that on linux a backslash can be part of a filename (without being a path separator).
requests wants < 6, but something else installs >= 6,
triggering this warning on stderr that breaks our tests:
/home/runner/work/borg/borg/.tox/py311-pyfuse3/lib/python3.11/site-packages/requests/__init__.py:113:
RequestsDependencyWarning: urllib3 (2.6.3) or chardet (6.0.0dev0)/charset_normalizer (3.4.4) doesn't match a supported version!
Remove the handwritten bash and zsh shell completion scripts now that
auto-generated completions via borg completion bash/zsh (powered by
shtab, #9172) are tested and working. Fish completions are kept as
shtab does not yet support fish.
Replace string-matching tests with focused behavior tests: script size
sanity, shell syntax validation (bash -n / zsh -n), and tests that
invoke the custom preamble functions in bash (sortby key dedup,
filescachemode mutual exclusivity, archive name and aid: prefix
completion against a real repository).
Add --json-lines flag to 'borg benchmark crud' that outputs
each measurement as a JSON object (one per line) for easy
machine parsing. Also improve test coverage to validate both
human-readable and JSON-lines output formats.
Add --json flag to 'borg benchmark cpu' that outputs all benchmark
results as a single JSON object for easy machine parsing. Size values
use integers (bytes) in JSON and format_file_size() for human-readable
text output. Also add tests for both plain-text and JSON output formats.
- 2.0.x: mark as beta (not yet stable release)
- 1.2.x: no new releases, critical fixes may still be backported
- Keep 1.4.x as supported, 1.1.x and below as unsupported
If multiple environment variables for the same passphrase context are
provided (e.g., both BORG_PASSPHRASE and BORG_PASSCOMMAND), Borg now
terminates with an error instead of silently choosing one.
This prevents the issue where an old BORG_PASSPHRASE in the environment
could override a newly intended BORG_PASSCOMMAND or BORG_PASSPHRASE_FD.
When running as a Pyinstaller-made binary, sys.executable points to the
borg binary itself. Invoking it with "-m borg" resulted in an incorrect
command line (e.g., "borg -m borg ..."), which confused the argument
parser in the subprocess.
This change checks sys.frozen to determine the correct invocation:
- If frozen: [sys.executable, ...args]
- If not frozen: [sys.executable, "-m", "borg", ...args]
The check_python() function verified that the Python runtime supported
'follow_symlinks' for os.stat, os.utime, and os.chown. This check is no
longer necessary because:
1. Borg now requires Python >= 3.10.
2. On POSIX systems (Linux, macOS, *BSD, Haiku, OmniOS), support for these
operations relies on the *at syscalls (fstatat, etc.), which have been
implemented in standard libc for well over a decade (e.g., FreeBSD 8.0+,
NetBSD 6.0+, Solaris 11+).
3. On Windows (MSYS2/MinGW), Python has supported follow_symlinks for
os.stat since Python 3.2. The removed check specifically inspected only
os.stat on Windows, avoiding the problematic os.utime/os.chown checks.
Any platform capable of running Python 3.10 will inherently support these
standard file operations.
ci.yml already has timeout-minutes on every job, but these three
workflows had no timeout configured. Without an explicit timeout,
GitHub Actions defaults to 6 hours, wasting CI minutes if a job
gets stuck.
Added timeouts consistent with ci.yml:
- codeql-analysis.yml: 20 min (builds from source + analysis)
- backport.yml: 5 min (simple checkout + PR creation)
- black.yaml: 5 min (matches ci.yml ruff lint job)
Fixes#9298
if an already existing fs directory has the correct (as archived) mtime,
we have already extracted it in a previous borg extract run and we do not
need and should not call restore_attrs for it again.
if the directory exists, but does not have the correct mtime, restore_attrs
will be called and its attributes will be extracted (and mtime set to
correct value).
Enable JUnit XML generation for `native_tests` and `windows_tests` to allow Codecov to process test analytics.
Upload the generated `test-results.xml` using `codecov/codecov-action`.
The upload step uses `if: !cancelled()` to ensure results are uploaded even if tests fail (to analyze failures), but skipped if the workflow is explicitly cancelled.
Originally brought up in PR #8752 by @katia-sentry, but missed the !cancelled() check.
Also: upgrade to codecov-action@v5.
This allows users to compare file content efficiently without reading the
full file data, by exposing a hash of the chunk IDs and the relevant
conditions for valid comparisons, like chunker params, chunker seed/key,
id key, key type, etc.
This is based on PR #5167 by @hrehfeld, code + discussion, with some changes:
- the conditions hash now includes more relevant input params
- returning a single value that is composed of 2 parts
- tests (including new buzhash64)
Example output (different files in same archive):
1e88bfb02d0a5320-a539587200c33b857f9827d01fcb7dabacf30501c83929e7308668d43f4a6302 file1
1e88bfb02d0a5320-9ed78a4c14d0506d9ae75d914cca90db64655ddea22647dd1c89f19e2fc080ae file2
The fingerprint has 2 parts:
First part: same hash, indicates same chunking / chunk id generation params,
meaning that the second part is valid to be compared.
Second part: different hash, because file content is different.
same hash here would mean same content.
A pre-existing directory might be a btrfs subvolume that was created by
the user ahead of time when restoring several nested subvolumes from a
single archive.
If the archive item to be extracted is a directory and there is already
a directory at the destination path, do not remove (and recreate) it,
but just use it.
That way, btrfs subvolumes (which look like directories) are not deleted.
Fix originally contributed by @intelfx in #7866, but needed more work,
so I thought more about the implications and added a test.
Note:
In the past, we first removed (empty) directories, then created a fresh
one, then called restore_attrs for that. That produced correct metadata,
but only for the case of an EMPTY exisiting directory. If the existing
directory was not empty, the simply os.rmdir we tried did not work
anyway and did not remove the existing directory.
Usually we extract to an empty base directory, thus encountering this
edge case is mostly limited to continuing a previous extraction.
In that case, calling restore_attrs again on a directory that already has
existing attrs should be harmless, because they are identical.
This implementation should be good enough for our usecase (paths) and has no external dependencies.
There is also a wcwidth library which might be even better, but would add another dependency.
This commit implements a comprehensive approach to Windows path compatibility
by standardizing on forward slashes (/) for all internal path representations
while maintaining cross-platform archive compatibility.
Core Strategy:
- All internal paths now use forward slashes as separators on all platforms
- Boundary normalization: backslashes converted to forward slashes at entry
points on Windows (filesystem paths only, not user patterns)
- Literal backslashes from POSIX archives replaced with % on Windows extraction
Key Changes:
Path Handling (helpers/fs.py):
- Added slashify(): converts backslashes to forward slashes on Windows
- Added percentify(): replaces backslashes with % for POSIX-to-Windows extraction
- Updated make_path_safe() to check for Windows-style .. patterns
- Changed get_strip_prefix() to use posixpath.normpath instead of os.path.normpath
- Updated remove_dotdot_prefixes() to use forward slashes consistently
Pattern Matching (patterns.py):
- Replaced os.path with posixpath throughout for consistent separator handling
- Updated PathFullPattern, PathPrefixPattern, FnmatchPattern, ShellPattern
- All pattern matching now uses / as separator regardless of platform
- Removed platform-specific os.sep usage
Archive Operations (archive.py, item.pyx):
- Applied slashify() to paths during archive creation on Windows
- Added percentify/slashify encoding/decoding for symlink targets
- Ensures archived paths always use forward slashes
Command Line (archiver/create_cmd.py, extract_cmd.py):
- Replaced os.path.join/normpath with posixpath equivalents
- Added slashify() for stdin-provided paths on Windows
- Updated strip_components to use / separator
- Changed PathSpec to FilesystemPathSpec for proper path handling
Repository (repository.py, legacyrepository.py):
- Replaced custom _local_abspath_to_file_url() with Path.as_uri()
Documentation (archiver/help_cmd.py):
- Clarified that all archived paths use forward slashes
- Added note about Windows absolute paths in archives (e.g., C/Windows/System32)
- Documented backslash-to-percent replacement for POSIX archives on Windows
Impact:
- Windows users can now create and extract archives with consistent path handling
- Cross-platform archives remain compatible
- Pattern matching works identically on all platforms
Seen this on the macOS arm64 runner:
ImportError: dlopen(/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so, 0x0002): tried: '/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so' (no such file), '/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))
Consolidate key backup documentation into `borg key export` and reference
it from Quickstart and FAQ to avoid duplication and inconsistency.
Clarify that while `repokey` or `authenticated` mode stores the key in the
repo, a separate backup is still recommended to protect against repository
corruption or data loss.
The single-file borg.exe needs unpacking each time it is invoked.
borg-dir/borg.exe is alread unpacked.
Also, macOS is slow when a "new" binary is first invoked, so
this should help there even more.
fuse2 was a bit misleading. it meant "our 2nd fuse implementation",
but could be misunderstood to refer to fuse v2.
hlfuse.py now means highlevel fuse, as opposed to the lowlevel fuse in fuse.py.
Updated mount_cmds_test.py to work with both llfuse/pyfuse3 and mfusepy
by checking for either implementation in skip conditions.
mfusepy: 2 test fails due to hardlink implementation differences
fixes#9182
- install OS fuse support packages as indicated by the tox env.
on the macOS runners, we do not have any fuse support.
on the linux runners, we may have fuse2 or fuse3.
on FreeBSD, we have fuse2.
- install fuse python library for binary build
- first build/upload binaries, then run tests (including binary tests).
early uploading makes inspection of a malfunctioning binary possible.
- for now, use llfuse, as there is an issue with pyinstaller and pyfuse3.
Also:
- remove || true - this just hides errors, not what we want.
emit only a warning, but let compaction complete.
after that, borg check --repair can fix the hints successfully.
likely this code won't be used in master branch as we only read from
legacy repos, but I ported this fix from 1.4-maint nevertheless.
This is the result of a longer discussion with Antigravity AI and me:
Detailed Explanation: Why Converting AssertionError to Warning is Correct
=========================================================================
PROBLEM OVERVIEW
----------------
The assertion `assert segments[segment] == 0` in compact_segments() was causing
borg compact to crash when segment reference counts in the hints file didn't
match the actual repository state. This typically occurred after index corruption
or repository recovery scenarios.
ROOT CAUSE ANALYSIS
-------------------
The crash happens due to a fundamental mismatch between two data structures:
1. self.segments (loaded from hints file)
- Contains reference counts for each segment
- Persisted to disk in the hints file
- Represents the "last known state"
2. self.index (loaded from index file)
- Contains mappings of object IDs to (segment, offset) pairs
- Can be corrupted or lost
- When corrupted, triggers auto-recovery
The Problem Scenario:
1. Repository has valid data with consistent hints.N and index.N
2. Index file gets corrupted (crash, disk error, etc.)
3. Borg detects corruption and auto-recovers:
- Loads hints.N (with old reference counts)
- Rebuilds index by replaying segments
- Commits the rebuilt index
4. State is now inconsistent IF segments were deleted/lost:
- self.segments[X] = 10 (from old hints, assumes segment X exists)
- Segment X was actually deleted/lost
- self.index has 0 entries for segment X (rebuilt from remaining segments)
5. During compact_segments():
- Tries to iterate objects in segment X
- Segment X doesn't exist (was deleted/lost)
- OR: segment X exists but objects aren't in index (superseded)
- segments[X] is never decremented
- segments[X] remains 10 instead of becoming 0
- Assertion fails!
WHY THE FIX IS CORRECT
----------------------
1. Hints are Advisory, Not Authoritative
The hints file is an optimization to avoid scanning all segments. It's
explicitly designed to be rebuildable from scratch by scanning segments.
Therefore, incorrect hints should not cause a fatal error.
2. Self-Healing Behavior
By converting the assertion to a warning and allowing compaction to proceed:
- Compaction completes successfully
- New hints are written with correct reference counts
- Repository is automatically healed
- No manual intervention required
3. Data Safety is Preserved
The fix does NOT compromise data integrity because:
- Compaction first copies all live data from segments to new segments
- Only after all live data is safely copied are segments marked for deletion
- The index determines what's "live" (authoritative source of truth)
- Segments are deleted only when they contain no live data (per index)
- The refcount warning indicates stale hints, not actual data loss risk
- After compaction, new hints are written with correct counts
4. Consistent with Design Philosophy
Borg already handles many corruption scenarios gracefully:
- Missing hints → regenerated from segments
- Corrupted index → rebuilt from segments
- Missing segments → detected and handled
This fix extends that philosophy to hint/index mismatches.
5. Alternative Solutions are Worse
Other approaches considered:
a) Crash and require manual intervention
- Current behavior, user-hostile
- Requires expert knowledge to fix
b) Automatically run check --repair
- Too aggressive, may hide real problems
- User should decide when to repair
c) Refuse to compact
- Leaves repository in degraded state
- Prevents normal operations
VERIFICATION
------------
The fix has been verified with test cases that reproduce both scenarios:
1. test_missing_segment_in_hints
- Simulates missing segment files
- Verifies compact succeeds and updates hints correctly
2. test_index_corruption_with_old_hints
- Simulates the root cause: corrupted index with old hints
- Verifies compact succeeds despite reference count mismatch
3. test_subtly_corrupted_hints_without_integrity
- Existing test updated to expect warning instead of crash
- Verifies repository remains consistent after compaction
OPERATIONAL IMPACT
------------------
After this fix:
1. Users experiencing this crash can now run `borg compact` successfully
2. The warning message alerts them to the inconsistency
3. They can optionally run `borg check --repair` for peace of mind
4. Repository continues to function normally
The warning message provides enough information for debugging while not
blocking normal operations.
CONCLUSION
----------
Converting the assertion to a warning is the correct fix because:
- It aligns with Borg's design philosophy of graceful degradation
- It enables self-healing behavior
- It preserves data safety
- It improves user experience
- It's consistent with how other corruption scenarios are handled
The assertion was overly strict for a data structure (hints) that is
explicitly designed to be advisory and rebuildable.
we can't monkeypatch stuff in Cython/C code, so we
go over python module attribute lookup.
that way, we can more easily test some functions that
internally do id<->name lookups.
I could not find the root cause of this issue, but it is likely a minor
problem with ctime and doesn't affect borg usage much.
So I rather like to have CI on netbsd not failing because of this.
The test fails on these platforms.
I could not find the root cause of this issue, but it is likely a minor
problem with ctime and doesn't affect borg usage much.
So I rather like to have CI on freebsd/netbsd not failing because of this.
Also: add is_netbsd and is_openbsd to platformflags.
so that pytest options are centrally managed in tox configuration.
let tox build venv and install requirements.
tox does this anyway, so we save some time if we
do not need the venv for other purposes also
(like e.g. building binaries).
Also:
- default XDISTN to "auto". XDISTN is still used by Vagrantfile.
- some other optimisations, like less package manager calls.
- use XDISTN=1 for haiku
- fix freebsd binary build condition
use borg diff --sort-by=spec1,spec2,spec2 for enhanced sorting.
remove legacy --sort behaviour (sort by path), this was deprecated
since 1.4.2.
Co-authored-by: Daniel Rudolf <github.com@daniel-rudolf.de>
This is a port of #9005 to master branch.
- grant id-token and attestations permissions to posix_tests job
- add actions/attest-build-provenance@v1 step for built artifacts
This publishes SLSA-style provenance for our tag builds (only when binaries
are produced) so users can verify the origin of downloaded borg binaries.
bad:
- no *BSD testing and FreeBSD binary building on gh
- binaries not signed by me, because they are built on gh
good:
- for linux intel/amd64 and arm64, built on ubuntu
- for macOS intel and arm64, built on a relatively recent macOS
- I can get rid of that ancient macOS VM I used for building.
- the source code distribution (sdist) is still made locally on
my machine and thus signed with my signature (*.asc).
preserve UF_COMPRESSED and SF_DATALESS when restoring flags,
get-modify-set in macOS set_flags, keeping system-managed read-only flags.
(cherry picked from commit 83571aa00d)
This flag needs to be set BEFORE writing to the file.
But "borg extract" sets the flags last (to support IMMUTABLE),
thus the compression flag would not work as expected.
(cherry picked from commit 56dda84162)
Linux platform only.
(cherry picked from commit 9214197a2c)
set_flags: if getting the flags fails, better give up than
corrupting them.
Thanks to Earnestly for the feedback on IRC.
(cherry picked from commit 9c600a9571)
when borg mount is used without -f/--foreground (so that the FUSE
borg process was started daemonized in the background), it did not
display the rc of the main process, even when --show-rc was used.
now it does display the rc of the main process.
note that this is rather a consistency fix than being super useful,
because the main "action" happens in the background daemon process,
not in the main process.
Previously when running borg in a systemd service (and similar when piping to
a file and co.), these problems occurred:
- The carriage return both made it so that journald interpreted the output as
binary, therefore not printing the text, while also not buffering
correctly, so that log output was only available every once in a while
in the form [40k blob data]. This can partially be worked around by
using `journalctl -a` to view the logs, which at least prints the text,
though only sporadically.
- The path was getting truncated to a short length, since the default
get_terminal_size returns a column width of 80, which isn't relevant
when printing to e.g. journald.
This commit fixes this by introducing a new code path for when stream is
not a tty, which always prints the full paths and ends lines with a linefeed.
This is based on unfinished PR #8939 by @infinisil, thanks for your suggestion!
Forward port of PR #9055 to master.
The VM was used for local macOS testing and
also for building a macOS intel fat binary.
We also do macOS CI testing on GitHub and I
recently added binary building on GitHub for
Apple Silicon and Intel.
The macOS 10 VM was very outdated, super slow
and a pain to use. I didn't succeed in building
a recent macOS vagrant VM, so we'll just use
GitHub from now on...
The original markup included a paragraph element wrapping a block-level pre element, which is invalid per HTML’s content model (a p can only contain phrasing content; pre is flow content).
The fix separated text and pre blocks into valid sibling elements, ensuring no pre is nested inside a p.
we only read from borg 1.x legacy repos, we must not
try to "fix" them (users can use borg1 check --repair).
had to remove some tests that relied on this "feature".
2 fixes:
- add code to update/verify the HashHeader integrity hash. this code was
missing and led to FileIntegrityError on the borg 1.x repo index.
- when reading a non-compact borg 1.x hash table from disk (like the borg
repo index), only add the "used" buckets to the in-memory hashtable,
but not the unused/tombstone buckets.
The corruption described in #9022 was happening like this:
- borg failed to read the repo index, because the integrity check failed
- due to open_index(..., auto_recover=True), it tried to "fix" it by
writing an empty hash table to disk. borg 1.x usually then rebuilt the
index, but somehow this wasn't happening for the user in #9022.
Borg2 documentation mentions the support for the s3 backend however,
borg was missing the parsing bits for an s3 repo.
This updates the Location parser to parse the s3 url using the same
logic as borgstore.
Note: borgstore should be installed with the s3 dependencies in order
for the s3 backend to work.
Signed-off-by: Mike Mason <github@mikemrm.com>
Implemented handling of POSIX access and default ACLs in tar files.
New keys, `SCHILY.acl.access` and `SCHILY.acl.default`, are used
to store these ACLs in the tar PAX headers.
control how borg detects whether a file has changed while it was backed up, valid modes are ctime, mtime or disabled.
ctime is the safest mode and the default.
mtime can be useful if ctime does not work correctly for some reason
(e.g. OneDrive files change their ctime without the user changing the file).
disabled (= disabling change detection) is not recommended as it could lead to
inconsistent backups. Only use if you know what you are doing.
the stuff in Python stdlib "random.Random" is not cryptographically strong
and the stuff in Python stdlib "secrets" can't be seeded and does not
offer shuffle.
the previous approach had cryptographic strength randomness, but a precise
50:50 0/1 bit distribution per bit position in the table was not assured.
now this is always the case due to the way how the table is constructed.
That way we can feed lots of entropy into the table creation.
The bh64_key is derived from the id_key (NOT the crypt_key), thus
it will create the same key for related repositories (even if they
use different encryption/authentication keys). Due to that, it will
also create the same buzhash64 table, will cut chunks at the same
points and deduplication will work amongst the related repositories.
Only compare the main version number, e.g. 1.1.1 (first 3 elements
of the version tuple).
Without this change, it would not accept 1.1.1rc1 because that is
not "<= (1, 1, 1)" in that simplistic version comparison.
Separated `chunker_test` into two dedicated test modules: `fixed_test` (for `ChunkerFixed`) and `buzhash_test` (for `Chunker`). Updated imports and adjusted references accordingly.
Moved the `ChunkerFixed` implementation from `chunker` to a new `fixed` module for better modularity. Updated imports and type hints.
Removed now empty chunkers.chunker module.
Moved the `ChunkerFailing` implementation from `chunker` to a new `failing` module for better modularity. Updated imports and type hints. Adjusted related definitions in `chunker.pyi` accordingly.
Moved `buzhash` implementation from `chunker` to a new `buzhash` module for better separation of concerns. Updated imports, adjusted `setup.py` and build configuration accordingly. Removed deprecated `Chunker` definitions from `chunker.pyi`.
Relocated `get_chunker` function from `chunker` module to `chunkers.__init__.py` for improved organization. Updated `Chunker` class signature to include a `sparse` parameter with a default value. Adjusted imports and type hints accordingly.
Extracted the `reader` logic from `chunker` into a dedicated `reader` module to improve modularity and maintainability. Updated imports, references, and build configurations accordingly.
ChunkerFixed can be configured to support files with a specific header size.
But we do not want to get an AssertionError if we encounter a 0-byte file
or a file that is shorter than the header size.
no options yet, just hardcoded macOS and Linux xattrs.
removed the --exclude-nodump option, it is also done automagically now.
also: create: call stat_ext_attrs early
this reads bsdflags, xattrs and ACLs from the
filesystem, except if the user chose to disable that.
notable:
- borg always reads these, even for unchanged files
- if we read them early, borg can now behave differently
based e.g. on a xattr value (and e.g. exclude the file)
we want to get rid of legacy stuff(*) one day and sha256 is as
good for this purpose (and might be even hw accelerated).
(*) considered legacy due to the way it gives the key to the
blake2b function (just padding and prepending it to the data,
instead of using the key parameter, see #8867 ).
Replaced inline file reading logic with `FileReader` to standardize handling across chunkers. Improved buffer updates and allocation handling for sparse files and optimized read operations.
Includes cases for simple reads, multiple reads, and mock chunk scenarios to verify behavior with mixed allocation types.
Also: change Chunk type for empty read result for better consistency.
Simplified and improved handling of mixed types of chunks during reading. The allocation type of resulting chunks is now determined based on contributing chunks.
The `header_size` parameter and related logic have been removed from file readers, simplifying their implementation. This change eliminates unnecessary complexity while maintaining all functional capabilities via `read_size` and `fmap`.
`FileFMAPReader` deals with sparse files (data vs holes) or fmap and yields blocks of some specific read_size using a generator.
`FileReader` uses the `FileFMAPReader` to fill an internal buffer and lets users use its `read` method to read arbitrary sized chunks from the buffer.
For both classes, instances now only deal with a single file.
Replaced `ChunkerFixed`'s block-reading functionality with a new `FileReader` class to streamline code and improve separation of concerns. Adjusted `ChunkerFixed` to delegate file reading to `FileReader` while focusing on chunk assembly.
`FileReader` is intended to be useful for other chunkers also, so they can easily implement sparse file reading / fmap support.
The `-Wno-unreachable-code-fallthrough` compiler flag suppresses warnings about fallthrough annotations in unreachable code.
In C switch statements, "fallthrough" occurs when execution continues from one case to the next without a break statement. This is often a source of bugs, so modern compilers warn about it. To indicate intentional fallthrough, developers use annotations like `__attribute__((fallthrough))`.
In Cython-generated C code, the `CYTHON_FALLTHROUGH` macro is defined to expand to the appropriate fallthrough annotation for the compiler being used. For example, in `compress.c`:
```c
#define CYTHON_FALLTHROUGH __attribute__((fallthrough))
```
The issue occurs because Cython generates code with conditional branches that may be unreachable on certain platforms or configurations. When these branches contain switch statements with fallthrough annotations, compilers like Clang issue warnings like:
```
warning: fallthrough annotation in unreachable code [-Wunreachable-code-fallthrough]
```
These warnings appear in the generated C code, not in the original Cython source. They're harmless but noisy, cluttering the build output with warnings about code we don't control.
By adding `-Wno-unreachable-code-fallthrough` to the compiler flags in `setup.py`, we specifically tell the compiler to ignore these particular warnings, resulting in a cleaner build output without affecting the actual functionality of the code.
This is a common practice when working with generated code - suppress specific warnings that are unavoidable due to the code generation process while keeping other useful warnings enabled.
Updated bash completions to include new commands such as `analyze`, `debug`, `repo-space`, `tag`, and `undelete`, along with their respective options. Fixed a typo in the `--upgrader` completions and improved completion handling for various commands.
thanks a lot to @sothix for helping with this!
removed pytest-forked, is not found anymore:
error: target not found: mingw-w64-ucrt-x86_64-python-pytest-forked
use a virtual env to avoid mixup of user with system packages.
remove old workaround for setuptools (SETUPTOOLS_USE_DISTUTILS: stdlib).
fix pip install
use --system-site-packages as a workaround for broken pip install python-cffi.
do not upgrade pip setuptools build wheel
use python -m pytest to use the one from the venv
Also: moved name length check to Archive.__init__, so it doesn't
read all other archives main metadata when creating a new archive.
In write-only mode, the files cache can't be built from the repo
from the latest archive of same series, we are not allowed to read that!
The posixfs borgstore backend implements permissions to make
testing with differently permissive stores easier.
The env var selects from pre-defined permission configurations
within borg and gives the chosen permissions config to borgstore.
Add incremental flag to `write_chunkindex_to_repo_cache`.
borg create uses incremental cache indexes to save progress.
But other OPs need to write a full index and delete all other cached indexes.
Added debug logging for missing object IDs.
Introduce tests to verify the functionality of the `repo-space` command, including space reservation, freeing, display, and edge cases. These tests ensure proper handling of various scenarios and validation of the respective outputs.
- borg repo-create and borg transfer not only support --repo / --other-repo options,
but also already supported related BORG_REPO and BORG_OTHER_REPO env vars.
- similar to that, the passphrases now come from BORG_[OTHER_]PASSPHRASE, BORG_[OTHER_]PASSCOMMAND or BORG_[OTHER_]PASSPHRASE_FD.
- borg repo-create --repo B --other-repo A does not silently copy the passphrase of key A
to key B anymore, but either asks for the passphrase or reads it from env vars.
Some features like append-only repositories rely on a server-side component
that enforces them (because that shall only be controllable server-side,
not client-side).
So, that can only work, if such a server-side component exists, which is the
case for borg 1.x ssh: repositories (but not for borg 1.x non-ssh: repositories).
For borg2, we currently have:
- fs repos
- sftp: repos
- rclone: repos (enabling many different cloud providers)
- s3/b3: repos
- ssh: repos using client/server rpc code similar as in borg 1.x
So, only for the last method we have a borg server-side process that could enforce some features, but not for any of the other repo types.
For append-only the current idea is that this should not be done within borg,
but solved by a missing repo object delete permission enforced by the storage.
borg create could then use credentials that miss permission to delete,
while borg compact would use credentials that include permission to delete.
Some features like repository quotas rely on a server-side component
that enforces them (because that shall only be controllable server-side,
not client-side).
So, that can only work, if such a server-side component exists, which is the
case for borg 1.x ssh: repositories (but not for borg 1.x non-ssh: repositories).
For borg2, we currently have:
- fs repos
- sftp: repos
- rclone: repos (enabling many different cloud providers)
- s3/b3: repos
- ssh: repos using client/server rpc code similar as in borg 1.x
So, only for the last method we have a borg server-side process that could enforce some features, but not for any of the other repo types.
For quotas the current idea is that this should not be done within borg,
but enforced by a storage specific quota implementation (like fs quota,
or quota of the cloud storage provider). borg could offer information
about overall repo space used, but would not enforce quotas within borg.
before this fix, borg also obfuscated other chunks it creates,
e.g. the archive metadata stream chunks, which is not necessary
and only added a lot of overhead.
as we have meta["type"] in borg2, this is easy to fix here.
It's easy enough to verify exhaustively for any plausible chunker params
that Padmé always produces at most a 12% overhead. Checking that again
at runtime is pointless.
This only happened when:
- using borg extract --numeric-ids
- processing NFS4 ACLs
It didn't affect POSIX ACL processing.
This is rather old code, so it looks like nobody used that
code or the bug was not reported.
The bug was discovered by PyCharm's "Junie" AI. \o/
Sometimes, usually for file content chunks, it makes sense to
generate all-zero replacement chunks on-the-fly.
But for e.g. an archive items metadata stream, this does not
make sense (because it wants to msgpack.unpack the data), so
we rather want None. In that case, we do not have the size
information anyway.
preloading: always use raise_missing=False, because
the behaviour is defined at preloading time.
fetch_many: use get_many with raise_missing=False.
if get_many yields None instead of the expected chunk
cdata bytes, on-the-fly create an all-zero replacement
chunk of the correct size (if the size is known) and
emit an error msg about the missing chunk id / size.
note: for borg recreate with re-chunking this is a bit
unpretty, because it will transform a missing chunk into
a zero bytes range in the target file in the recreated
archive. it will emit an error message at recreate time,
but afterwards the recreated archive will not "know"
about the problem any more and will just have that
zero-patched file.
so guess borg recreate with re-chunking should better
only be used on repos that do not miss chunks.
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.
transfer: do not transfer replacement chunks, deal with missing chunks in other_repo
FUSE fs read: IOError or all-zero result
fixes#8641
In the example, setting SYSTEMD_WANTS instead of appending may prevent
other autostart services attached by earlier udev rules from launching.
This commit changes = to += to fix this behavior.
fixes#8639
The priority of 40 for the udev rules as stated in to documentation
applies the rule too early on some systems, which prevents the rule from
matching. This commit changes the priority to 80.
Improve handling when defining a passphrase or debugging passphrase issues, fixes#8496
Setting `BORG_DEBUG_PASSPHRASE=YES` enables passphrase debug logging to stderr, showing passphrase, hex utf-8 byte sequence and related env vars if a wrong passphrase was encountered.
Setting `BORG_DISPLAY_PASSHRASE=YES` now always shows passphrase and its hex utf-8 byte sequence.
/Users/tw/w/borg/docs/internals/data-structures.rst:971:
WARNING: Lexing literal_block
'
[cache]
version = 1
repository = 3c4...e59
manifest = 10e...21c
timestamp = 2017-06-01T21:31:39.699514
key_type = 2
previous_location = /path/to/repo
[integrity]
manifest = 10e...21c
files = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
'
as "ini" resulted in an error at token: '}'.
Retrying in relaxed mode. [misc.highlighting_failure]
Note: this part of the docs didn't change for a long time, so I guess
the sudden warning comes from a change in sphinx' lexers.
Main problem is that rc != 0 will abort our CI pipeline.
see #8318
so long as it can be assumed that the user has configured a POSIX
compliant login shell, using a simple command [1] looks cleaner, as
no ``export`` or ``;`` are used.
[1] Section "2.9.1 Simple Commands" in volume "Shell & Utilities" of POSIX.1-2024
the python package pkgconfig does not need to be "preinstalled"
anymore, because our pyproject.toml cares for that. otoh, the cli tool
pkg-config must be preinstalled so that libs and headers can be found
automagically.
Also be a bit more clear about the FUSE stuff.
if retry is True, it will just retry to get a valid answer.
if retry is False, it will return the default.
the code can be tested by entering "error" (without the quotes).
It needs to be possible to iterate over all items in an archive,
do some output (e.g. if an item is included / excluded) and then
only preload content data chunks for the included items.
it looks like in brew they removed pkg-config formula and added
an alias to the pkgconf formula (which also provides a pkg-config
cli command).
the transition was not seamless:
on github actions CI:
Installing pkg-config
==> Downloading https://ghcr.io/v2/homebrew/core/pkgconf/manifests/2.3.0_1
==> Fetching pkgconf
==> Downloading https://ghcr.io/v2/homebrew/core/pkgconf/blobs/sha256:5f83615f295e78e593c767d84f3eddf61bfb0b849a1e6a5ea343506b30b2c620
==> Pouring pkgconf--2.3.0_1.arm64_sonoma.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /opt/homebrew
Could not symlink bin/pkg-config
Target /opt/homebrew/bin/pkg-config
is a symlink belonging to pkg-config@0.29.2. You can unlink it:
brew unlink pkg-config@0.29.2
To force the link and overwrite all conflicting files:
brew link --overwrite pkgconf
To list all files that would be deleted:
brew link --overwrite pkgconf --dry-run
Possible conflicting files are:
/opt/homebrew/bin/pkg-config -> /opt/homebrew/Cellar/pkg-config@0.29.2/0.29.2_3/bin/pkg-config
/opt/homebrew/share/aclocal/pkg.m4 -> /opt/homebrew/Cellar/pkg-config@0.29.2/0.29.2_3/share/aclocal/pkg.m4
/opt/homebrew/share/man/man1/pkg-config.1 -> /opt/homebrew/Cellar/pkg-config@0.29.2/0.29.2_3/share/man/man1/pkg-config.1
==> Summary
🍺 /opt/homebrew/Cellar/pkgconf/2.3.0_1: 27 files, 474KB
Installing pkg-config has failed!
`setup.py` hardcoded crypto library paths for OpenBSD, causing build
issue when OpenBSD drops specific OpenSSL version. Solution is to make
paths configurable.
Addresses #8553.
We do not want that urllib spoils test output with LibreSSL related
warnings on OpenBSD.
`NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently
the 'ssl' module is compiled with 'LibreSSL 3.8.2'`.
This should address #8506. Unfortunately I'm unable to test.
#8506 is likely caused by the Vagrant box having a mirror in its
`etc/installurl`, which does not offer 7.4 packages. There are other
mirrors out there who do, e.g., https://ftp.eu.openbsd.org/pub/OpenBSD/.
Proposed 'fix' is to replace the mirror in `/etc/installurl`.
Worst (but frequent) case here is that all or most of the chunks
in the repo need to get recompressed, thus storing all chunk ids
in a python list would need significant amounts of memory for
large repositories.
We already have all chunk ids stored in cache.chunks, so we now just
flag the ones needing re-compression by setting the F_COMPRESS flag
(that does not need any additional memory).
- ChunkIndex: implement system flags
- ChunkIndex: F_NEW flag as 1st system flag for newly added chunks
- incrementally write only NEW chunks to repo/cache/chunks.*
- merge all chunks.* when loading the ChunkIndex from the repo
Also: the cached ChunkIndex only has the chunk IDs. All values are just dummies.
The ChunkIndexEntry value can be used to set flags and track size, but we
intentionally do not persist flags and size to the cache.
The size information gets set when borg loads the files cache and "compresses"
the chunks lists in the files cache entries. After that, all chunks referenced
by the files cache will have a valid size as long as the ChunkIndex is in memory.
This is needed so that "uncompress" can work.
- doesn't need a separate file for the hash
- we can later write multiple partial chunkindexes to the cache
also:
add upgrade code that renames the cache from previous borg versions.
Consider soft-deleted archives/ directory entries, but only create a new
archives/ directory entry if:
- there is no entry for that archive ID
- there is no soft-deleted entry for that archive ID either
Support running with or without --repair.
Without --repair, it can be used to detect such inconsistencies and return with rc != 0.
--repository-only contradicts --find-lost-archives.
We are only interested in archive metadata objects here, thus for most repo objects
it is enough to read the repoobj's metadata and determine the object's type.
Only if it is the right type of object, we need to read the full object (metadata
and data).
This reverts commit d3f3082bf4.
Comment by jdchristensen:
I agree that "wipe clean" is correct grammar, but it doesn't match the situation in "unmount cleanly".
The change in this patch is definitely wrong.
Putting it another way, one would never say that we "clean unmount a filesystem".
We say that we "cleanly unmount a filesystem", or in other words, that it "unmounts cleanly".
But the original text is slightly awkward, so I would propose: "When running in the foreground,
^C/SIGINT cleanly unmounts the filesystem, but other signals or crashes do not."
(Not that this guarantees anything, but I'm a native speaker.)
We gave up refcounting quite a while ago and are only interested
in whether a chunk is used (referenced) or not (orphan).
So, let's keep that uint32_t value, but use it for bit flags, so
we could use it to efficiently remember other chunk-related stuff also.
If we have an entry for a chunk id in the ChunkIndex,
it means that this chunk exists in the repository.
The code was a bit over-complicated and used entry.refcount
only to detect whether .get(id, default) actually got something
from the ChunkIndex or used the provided default value.
The code does the same now, but in a simpler way.
Additionally, it checks for size consistency if a size is
provided by the caller and a size is already present in
the entry.
- refactor packing/unpacking of fc entries into separate functions
- instead of a chunks list entry being a tuple of 256bit id [bytes] and 32bit size [int],
only store a stable 32bit index into kv array of ChunkIndex (where we also have id and
size [and refcount]).
- only done in memory, the on-disk format has (id, size) tuples.
memory consumption (N = entry.chunks list element count, X = overhead for rest of entry):
- previously:
- packed = packb(dict(..., chunks=[(id1, size1), (id2, size2), ...]))
- packed size ~= X + N * (1 + (34 + 5)) Bytes
- now:
- packed = packb(dict(..., chunks=[ix1, ix2, ...]))
- packed size ~= X + N * 5 Bytes
on macOS, installing older Pythons seems to uninstall OpenSSL 3 and only 1.1 is left.
also, building all these pythons and misc. openssl versions takes forever and we
only need 3.12 for the binary build. testing on misc. python versions is regularly
done one github actions CI.
- remove more hashindex tests
- remove IndexBase, _hashindex.c remainders
- remove early borg2 NSIndex
- remove hashindex_variant (we only support borg 1.x repo index
aka NSIndex1, everything else uses the borghash based stuff)
- adapt code / tests so they use NSIndex1 (not NSIndex)
- minor fixes
- NSIndex1 can read the old borg 1.x on-disk format, but not write it.
- NSIndex1 can read/write the new borghash on-disk format.
- adapt legacyrepository code to work with NSIndex1 (segment, offset)
values instead of NSIndex (segment, offset, size).
- Mention zstd as the best general choice when not using lz4
(as often acknowledged by public benchmarks)
- Mention 'auto' more prominently as a good heuristic to improve
speed while retaining good compression
- Link to compression options
Also:
- remove most hashindex tests, borghash has such tests
- have a small wrapper class ChunkIndex around HashTableNT to
adapt API difference and add some special methods.
Note: I needed to manually copy the .pxd files from borghash
into cwd, because they were not found:
- ./borghash.pxd
- borghash/_borghash.pxd
There were still some relicts from pre-borgstore / borg 1.x in there:
- patterns about "::", used to be separator between repository and archive.
- patterns for //server/share (not supported by borgstore)
Also: unified ssh+sftp and file+socket processing.
special tags start with @ and have clobber protection,
so users can't accidentally remove them using borg tag --set.
it is possible though to still use --set, but one must also
give all special tags that the archive(s) already have.
there is only a known set of allowed special tags:
@PROT - protects archives against archive pruning or archive deletion
setting unknown tags beginning with @ is disallowed.
as borg now uses repository.store_load and .store_save to load
and save the chunks cache, we need a rather high limit here.
this is a quick fix, the real fix might be using chunks of the
data (preferably <= MAX_OBJECT_SIZE), so there is less to unpack
at once.
Read or modify this set, only add validated str to it:
Archive.tags: Optional[set[str]]
borg info [--json] <archive> displays a list of comma-separated archive tags (currently always empty).
borg 1.x encouraged users to put everything into the archive name:
- name of the dataset
- timestamp (usually used to make the archive name unique)
- maybe also hostname (when backing up to same repo from multiple hosts)
- maybe also username (when backing up to same repo from multiple users)
borg2 now discourages users from putting the timestamp into the name,
because we rather want same name within a series of archives - thus,
the field width for the name can be narrower.
the ID of the archive is now the only unique identifier, thus it is
moved to the leftmost place.
256bits (64 hex digits) was a bit much and as borg can also deal with
abbreviated IDs, we only show 32bits (8 hex digits) by default.
the ID is followed by the timestamp (also quite "interesting", because
it usually differs for different archives).
then following are: archive name, user name, host name - these might be
always the same if there is only one series of archives in a repo.
use 2 blanks separating the fields for better readability.
Needed to change this because listing just the
archive names is pretty useless if names are not
unique.
The short list is likely mostly used by scripts to
iterate over all archives, so outputting IDs is
better.
Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.
We can also detect that we are finished, if .list()
returns less than the limit we gave to it.
Also: reduce code duplication by using repo_lister func.
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.
When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.
borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.
cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.
cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.
so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
if we detect the conditions for this (rare) race,
abort reading the file and retry.
The caller (_process_any) will do up to MAX_RETRIES
before giving up. If it gives up, a warning is logged
and the file is not written to the archive and won't
be memorized in the files cache either.
Thus, the file will be read/chunked/hashed again at
the next borg create run.
- on explicit request, update .last_refresh_dt inside _create_lock / _delete_lock
- reset .last_refresh_dt if we kill our own lock
- be more precise, have exactly the datetime of the lock in .last_refresh_dt
- cosmetic: do refresh/stale time comparisons always in the same way
- changes to locally stored files cache:
- store as files.<H(archive_name)>
- user can manually control suffix via env var
- if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
archive's metadata from the repo (better than starting with
empty files cache and needing to read/chunk/hash all files).
previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
repo from same machine/user).
Cache entries now have ctime AND mtime.
Note: TTL and age still needed for discarding removed files.
But due to the separate files caches per series, the TTL
was lowered to 2 (from 20).
repository.list is slow, so rather use the chunkindex,
which might be cached in future. currently, it also uses
repository.list, but at least we can solve the problem
at one place then.
under all circumstances, we must avoid that the lock
gets stale due to not being refreshed in time.
there is some internal rate limiting in _lock_refresh,
so calling it often should be no problem.
in borg 1.x, we used to put a timestamp into the archive name to make
it unique, because borg1 required that.
borg2 does not require unique archive names, but it encourages you
to even use an identical archive name within the same SERIES of archives.
that makes matching (e.g. for prune, but also at other places) much
simpler and borg KNOWS which archives belong to the same series.
for the archives directory, we only need to know the archive IDs,
everything else can be fetched from the ArchiveItem in the repo.
so we store empty files into archives/* with the archiv ID as name.
this makes some "by-id" operations much easier and we don't have to
deal with a useless "store_key" anymore.
removed .delete method - we can't delete by name anymore as we
allow duplicate names for the series feature. everything uses
delete_by_id() now.
also: simplify, clean up, refactor
- we should always output name and id when talking about an archive
- no problem anymore if names in archives directory are "duplicate"
- use "by-id" archives directory entry delete function
- rewrite/simplify test for borg check --undelete-archives
so if one works with backup series, one can just do:
borg prune --keep-daily 30 seriesname
seriesname will then do a precise match on the archive names
and select that series.
aid:<archive-id-prefix> can be used for -a / --match-archives
to match on the archive id (prefix) instead of the name.
NAME positional argument now also supports matching (and aid:),
but requires that there is exactly ONE result.
macOS and Linux give EISDIR, while Windows gives EPERM when trying to
open a file for writing, if the filename is already taken by an existing
directory.
now all OSes should give the same RC in this case.
borg delete and borg prune do a quick and dirty archive deletion,
just removing the archives directory entry for them.
--undelete-archives can still find the archive metadata objects
by completely scanning the repository and re-create missing
archives directory entries.
but only until borg compact would remove all unused data.
if only the manifest is missing or corrupted, do not run that
scan, it is not required for the manifest anymore.
if the manifest file is missing, check generated *.1 *.2 ... archives although an entry for the correct name and id was already
present. BUG!
this is because if the manifest is lost, that does not imply
anymore that the complete archives directory is also lost, as it
did in borg 1.x.
Also improved log messages a bit.
not for check and compact, these need an exclusive lock.
to try parallel repo access on same machine, same user,
one needs to use a non-locking cache implementation:
export BORG_CACHE_IMPL=adhoc
this is slow due the missing files cache in that implementation,
but unproblematic because no caches/indexes are persisted.
old borg just didn't commit the transaction and
thus caused a transaction rollback if not in
repair mode.
we can't do that anymore, thus we must avoid
modifying the repo if not in repair mode.
previously, borg always read all archives entries, modified the
list in memory, wrote back to the repository (similar as borg 1.x
did).
now borg works directly with archives/* in the borgstore.
otherwise the lock might become stale and could get
killed by any other borg process.
note: ThreadRunner class written by PyCharm AI and
only needed small enhancements. nice.
reuse_chunk is the complement of add_chunk for already existing chunks.
It doesn't do refcounting anymore.
.seen_chunk does not return the refcount anymore, but just whether the chunk exists.
If we add a new chunk, it immediately sets its refcount to MAX_VALUE, so
there is no difference anymore between previously existing chunks and new
chunks added. This makes the stats even more useless, but we have less complexity.
.init_chunks has just built self.chunks using repository.list(), so don't
call that again, but just iterate over self.chunks.
also some other changes, making the code much simpler.
When the AdhocCache(WithFiles) queries chunk IDs from the repo to build the chunks
index, it won't know their refcount and thus all chunks in the index have their
refcount at the MAX_VALUE (representing "infinite") and that would never decrease
nor could that ever reach zero and get the chunk deleted from the repo.
Only completely new chunks first written in the current borg run have a valid
refcount.
In some exception handlers, borg tried to clean up chunks that won't be used
by an item by decref'ing them. That is either:
- pointless due to refcount being at MAX_VALUE
- inefficient, because the user might retry the backup and would need to
transmit these chunks to the repo again.
We'll just rely on borg compact ONLY to clean up any unused/orphan chunks.
borg1 needed this due to its transactional / rollback behaviour:
if there was uncommitted stuff in the repo, next repo opening automatically
rolled back to last commit. thus we needed checkpoint archives to reference
chunks and commit the repo.
borg2 does not do that anymore, unused chunks are only removed when the
user invokes borg compact.
thus, if a borg create gets interrupted, the user can just run borg create
again and it will find some chunks are already in the repo, making progress
even if borg create gets frequently interrupted.
This was an implementation specific "in on-disk order" list method that made sense
with borg 1.x log-like segment files only.
But we now store objects separately, so there is no "in on-disk order" anymore.
This was used for an implementation detail of the borg 1.x
repository code, dumping uncommitted objects. Not needed any more.
Also remove local repository method scan_low_level, it was only used by --ghost.
Tests were a bit tricky as there is validation on 2 layers now:
- repository3 does an xxh64 check, finds most corruptions already
- on the archives level, borg also does an even stronger cryptographic check
Dummy returns all-zero stats from that call.
Problem was that these values can't be computed from the chunks cache
anymore. No correct refcounts, often no size information.
Also removed hashindex.ChunkIndex.summarize (previously used by the above mentioned
.stats() call) and .stats_against (unused) for same reason.
Lots of low-level code written back then to optimize runtime of some
functions.
We'll solve this differently by doing less stats, esp. if it is expensive to compute.
Note: this is the default cache implementation in borg 1.x,
it worked well, but there were some issues:
- if the local chunks cache got out of sync with the repository,
it needed an expensive rebuild from the infos in all archives.
- to optimize that, a local chunks.archive.d cache was used to
speed that up, but at the price of quite significant space needs.
AdhocCacheWithFiles replaced this with a non-persistent chunks cache,
requesting all chunkids from the repository to initialize a simplified
non-persistent chunks index, that does not do real refcounting and also
initially does not have size information for pre-existing chunks.
We want to move away from precise refcounting, LocalCache needs to die.
much faster and easier now, similar to what borg delete --force --force used to do.
considering that speed, no need for checkpointing anymore.
--stats does not work that way, thus it was removed. borg compact now shows some stats.
Features:
- exclusive and non-exclusive locks
- acquire timeout
- lock auto-expiry (after 30mins of inactivity), lock refresh
- use tz-aware datetimes (in utc timezone) in locks
Also:
- document lock acquisition rules in the src
- increased default BORG_LOCK_WAIT to 10s
- better document with-lock test
Stale locks are ignored and automatically deleted.
Default: stale == 30 Minutes old.
lock.refresh() can be called frequently to avoid that an acquired lock becomes stale.
It does not do much if the last real refresh was recently.
After stale/2 time it checks and refreshes the locks in the store.
Update the repository3 code to call refresh frequently:
- get/put/list/scan
- inside check loop
borg transfer is primarily a general purpose archive transfer function
from borg2 to related borg2 repos.
but for upgrades from borg 1.x, we also need to support:
- rcreate with a borg 1.x "other repo"
- transfer with a borg 1.x "other repo"
It uses xxh64 hashes of the meta and data parts to verify their validity.
On a server with borg, this can be done server-side without the borg key.
The new RepoObj header has meta_size, data_size, meta_hash and data_hash.
Simplify the repository a lot:
No repository transactions, no log-like appending, no append-only, no segments,
just using a key/value store for the individual chunks.
No locking yet.
Also:
mypy: ignore missing import
there are no library stubs for borgstore yet, so mypy errors without that option.
pyproject.toml: install borgstore directly from github
There is no pypi release yet.
use pip install -e . rather than python setup.py develop
The latter is deprecated and had issues installing the "borgstore from github" dependency.
test the healing more thoroughly:
- preservation of correct chunks list in .chunks_healthy
- check that .chunks_healthy is removed after healing
- check that doing another borg check --repair run does not find
something to heal, again.
also did a datatype consistency fix for item.chunks_healthy list
members: they are now post processed in the same way as item.chunks,
so they have type ChunkListEntry rather than simple tuple.
it needs to be like this to support a ~/.pypirc like this,
containing a separate upload token for the borgbackup project:
[distutils]
index-servers =
borgbackup
...
[borgbackup]
repository = https://upload.pypi.org/legacy/
username = __token__
password = pypi-...(token)...
Also: support a "cli" env var value, that does not determine
the implementation from the env var, but rather from cli options (similar to as it was before adding BORG_CACHE_IMPL).
- skip test_cache_chunks if there is no persistent chunks cache file
- init self.chunks for AdHocCache
- remove warning output from AdHocCache.__init__, it gets mixed with JSON output and fails the JSON decoder.
Add new borg create option '--prefer-adhoc-cache' to prefer the
AdHocCache over the NewCache implementation.
Adjust a test to match the previous default behaviour (== use the
AdHocCache) with --no-cache-sync.
removed some code borg had for backwards compatibility with
old borg versions (that had timestamp only in the cache).
now the manifest timestamp is only checked against the manifest-timestamp
file in the security dir, simplifying the code.
removed some code borg had for backwards compatibility with
old borg versions (that had key_type only in the cache).
now the repo key_type is only checked against the key-type
file in the security dir, simplifying the code.
removed some code borg had for backwards compatibility with
old borg versions (that had previous_location only in the
cache).
now the repo location is only checked against the location
file in the security dir, simplifying the code and also
fixing a related test failure with NewCache.
also improved test_repository_move to test for aborting in
case the repo location changed unexpectedly.
NewCache does not do precise refcounting, thus chunks won't be deleted
from the repo at "borg delete" time.
"borg check --repair" would remove such chunks IF they are orphans.
if we use AdHocCache or NewCache, we do not have precise refcounting.
thus, we do not delete repo objects as their refcount does not go to zero.
check --repair will just remove the orphans.
incref: returns (id, size), so it needs the size if it can't
get it from the chunks index. also needed for updating stats.
decref: caller does not always have the chunk size (e.g. for
metadata chunks),
as we consider 0 to be an invalid size, we call with size == 1
in that case. thus, stats might be slightly off.
the files cache used to have only the chunk ids,
so it had to rely on the chunks index having the
size information - which is problematic with e.g.
the AdhocCache (has size==0 for all not new chunks) and blocked using the files cache there.
Try to rebuild cache if an exception is raised, fixes#5213
For now, we catch FileNotFoundError and FileIntegrityError.
Write cache config without manifest to prevent override of manifest_id.
This is needed in order to have an empty manifest_id.
This empty id triggers the re-syncing of the chunks cache by calling sync() inside LocalCache.__init__()
Adapt and extend test_cache_chunks to new behaviour:
- a cache wipe is expected now.
- borg detects the corrupt cache and wipes/rebuilds the cache.
- check if the in-memory and on-disk cache is as expected (a rebuilt chunks cache).
That "failed to map segment from shared object" error msg is not
very helpful. Add a hint that the filesystem needs to be +exec
(== not noexec mounted, like it might be the case for /tmp on
some systems).
Looks like borg's setup.py has hidden the real cause of a cythonize ImportError.
There are basically 2 cases:
- either there is no Cython installed, then the import fails because the module can not be found, or
- there is some issue within Cython and the import fails due to that.
It's important not to hide the real cause, especially if we run into case 2.
case 1 is kind of expected and frequent, case 2 is rare.
Previously:
- acl_get just returned for lpathconf returning EINVAL
- acl_get silently ignored all other lpathconf errors and
implied it is not a NFS4 acl
Now:
- not sure why the EINVAL silent return was done, but it seems
wrong. guess it could be the system not implementing a check
for nfs4. but in that case guess we still would like to get
the default and access ACL!? Thus, I removed the silent return.
- raise OSError for all lpathconf errors
Cosmetic: add a nfs4_acl boolean, so the code reads better.
... to implement same semantics as on linux (only store ACL
if it defines permissions other than those defined by the
traditional file permissions).
Looks like there is no call working with an fd on FreeBSD.
This is NOT a bug fix, because the previous code contained a
check for symlinks before that line - because symlinks can not
have ACLs under Linux.
Now, this "is it a symlink" check is removed to simplify the
code and the "nofollow" variant of acl_extended_file* is used
to look at the symlink fs object (in the symlink case).
It then should tell us that this does NOT have an extended ACL
(because symlinks can't have ACLs) and so we return there.
Overall the code gets simpler and looks less suspect.
Previously, these conditions were handled the same (just return):
- no extended acl here
- some error happened (e.g. ACLs unsupported, bad file descriptor, file not found, permission error, ...)
Now there will be OSErrors for the error cases.
- ACLs are not working, if ENOTSUP ("Operation not supported") happens
- fix check for macOS
On macOS borg uses "acl_extended", not "acl_access" and
also the ACL text format is a bit different.
- macOS: run on macos-14 (on Apple Silicon!)
- macOS: use OpenSSL 3.0 from brew
- macOS: run with Python 3.11
- pip install -e .: add -v
- use up-to-date github actions
- remove libb2 references - since borg 1.2, we use blake2 indirectly via python stdlib
this was recently set to a relatively high minimum version when
locating it via pkgconfig was added. this broke the binary builds
on buster and bullseye.
i don't think borg requires a specific libacl version as long as
the api is compatible, so i now set this to 2.2.47 (from 2008).
borg init calls this. If there is a PermissionError, it is
usually fs permission issue at path or its parent directory.
Don't give a traceback, but rather an error msg and a specific exit code.
this is a fwd port from 1.4-maint. as we don't have nonce files
any more in master, only the generally useful stuff has been ported.
- add Error / ErrorWithTraceback exception classes to RPC layer.
- add hex_to_bin helper
if we do multiple calls to Archiver.do_something(),
we need to reset the ec / warnings after each call,
otherwise they will keep growing (in severity, in length).
stop directly accessing the variables from other modules.
prefix with underscore to indicate that these shall
only be used within this module and every other user
shall call the respective functions.
this is not needed and getting rid of it makes
the code / behaviour simpler to understand:
if a fatal error is detected, we throw an exception.
if we encounter something warning worthy, we emit and collect the warning.
in a few cases, we directly call set_ec to set the
exit code as needed, e.g. if passing it through
from a subprocess.
also:
- get rid of Archiver.exit_code
- assert that return value of archiver methods is None
- fix a print_warning call to use the correct formatting method
- implement updating exit code based on severity, including modern codes
- extend print_warning with kwargs wc (warning code) and wt (warning type)
- update a global warnings_list with warning_info elements
- create a class hierarchy below BorgWarning class similar to Error class
- diff: change harmless warnings about speed to rc == 0
- delete --force --force: change harmless warnings to rc == 0
Also:
- have BackupRaceConditionError as a more precise subclass of BackupError
previously, this was handled in RPCError handler and always resulted in rc 2.
now re-raise Lock Exceptions locally, so it gives rc 2 (legacy) or 7x (modern).
If not set, it will default to "legacy" (always return 2 for errors).
This commit only changes the Error exception class and its subclasses.
The more specific exit codes need to be defined via .exit_mcode in the subclasses.
Also: use ERROR loglevel for these (not WARNING).
A different amount of index entries was already logged as error
and led to "error_found = True" in repository.check.
Different values in the rebuilt index vs. the on-disk index were
only logged on warning level, but did not lead to error_found = True.
Guess there is no reason why these should not be errors and lead to
error_found = True, so this was fixed in this commit.
Minor related change: change report_error function args, so it can be
called like logger.error - including giving a format AND args.
the netbsd vagrant machine tends to segfault, guess due to some kernel or virtualbox issue.
thus, rather only do 1 tox run, so there is less output to review.
there are multiple issues with that box:
- debian 9 is out of support by debian, out of even lts support since 2022
- it has a OpenSSL 1.x natively (and our source based install also used 1.x) - that is also out of support and noone will care for it.
Also, borg2 will still take a while, so it would be
even more outdated at release time as it already
is now.
The intention of LockRoster.modify(key, REMOVE) is to remove self.id.
Using set.discard will just ignore it if self.id is not present there anymore.
Previously, using set.remove triggered a KeyError that has been frequently
seen in tracebacks of teardowns involving Repository.__del__ and Repository.__exit__.
I added a REMOVE2 op to serve one caller that needs to get the KeyError if
self.id was not present.
Thanks to @herrmanntom for the workaround!
When borg invokes a system command, it needs to prepare the environment
for that. This is especially important when using a pyinstaller-made
borg fat binary that works with a modified env var LD_LIBRARY_PATH -
system commands may crash with that.
borg already had calls to prepare_subprocess_env at some places (e.g.
when invoking ssh for the remote repo connection), but they were
missing for:
borg create --content-from-command ...
borg create --paths-from-command ...
before this fix, borg check --repair just created an
empty shadow index, which can lead to incomplete
entries if entries are added later.
and such incomplete (but present) entries can lead to
compact_segments() resurrecting old PUTs by accidentally
dropping related DELs.
get_args() exception handling before this fix only dealt with
subclasses of "Error", but we have to expect other exceptions
there, too.
In any case, if we have some fatal exception here, we must
terminate with rc 2.
ArgumentTypeError: emit a short error message - usually this is
a user error, invoking borg in a wrong way.
Other exceptions: full info and traceback.
for the other compression methods, this is done in
the base class, but the zlib legacy does not call
that method as it also removes the header bytes,
which zlib legacy does not have.
Hint for Cygwin users to make sure they use a virtual environment.
Not using a virtual environment will be likely troublesome if there is already a Python installed on Windows.
also: do a small optimisation in borg check:
if the type of the repo object is not ROBJ_ARCHIVE_META, we
can skip the object, it can not contain valid archive meta data.
if the type is correct, this is already a sufficient check, so
we can be quite sure that there will be valid archive metadata
in the object.
writing: put type into repoobj metadata
reading: check wanted type against type we got
repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.
a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).
also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
authentication/decryption would fail on mismatch.
- the type check would fail.
thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
For many use cases, the repo-wide "rcompress" is more efficient.
Also, recreate --recompress calls add_chunk with overwrite=True,
which is unsupported with the AdHocCache.
twine is only needed at release time, no need
for all developers or all test runs to install
this.
also, some requirement of twine needs a rust
compiler, so if there is no rust compiler,
automated runs will abort due to that.
remove a lot of complexity from the code that was just there to
support legacy borg versions < 1.0.9 which did not TAM authenticate
the manifest.
since then, borg writes TAM authentication to the manifest,
even if the repo is unencrypted.
if the repo is unencrypted, it did not check the somehow pointless
authentication that was generated without any secret, but
if we add that fake TAM, we can also verify the fake TAM.
if somebody explicitly switches off all crypto, they can not
expect authentication.
for everybody else, borg now always generates the TAM and also
verifies it.
rebuild_refcounts verifies and recreates the TAM.
Now it re-uses the salt, so that the archive ID does not change
just because of a new salt if the archive has still the same data.
list: shows either "verified" or "none", depending on
whether a TAM auth tag could be verified or was
missing (old archives from borg < 1.0.9).
when loading an archive, we now try to verify the archive
TAM, but we do not require it. people might still have
old archives in their repos and we want to be able to
list such repos without fatal exceptions.
This part of the archive checker recreates the Archive
items (always, just in case some missing chunks needed
repairing).
When loading the Archive item, we now verify the TAM.
When saving the (potentially modified) Archive item,
we now (re-)generate the TAM.
Archives without a valid TAM are dropped rather than TAM-authenticated
when saving them. There shouldn't be any archives without a valid TAM:
- borg writes an archive TAM since long (1.0.9)
- users are expected to TAM-authenticate archives created
by older borg when upgrading to borg 1.2.5.
Also:
Archive.set_meta: TAM-authenticate new archive
This is also used by Archive.rename and .recreate.
In these tests, we only compare paths, but we do not
need to create these paths for that. By not trying to
create them, we can avoid permission issues, e.g. under
fakeroot.
- master branch has different free space requirements from 1.2-maint,
so we now use a 700MB filesystem
- used pytest.mark.parametrize for the test passes, kind of a progress
display
- fix bug in rcreate call, encryption arg is needed
- fix bug in lock file cleanup
- added repo space cleanup
- updated docstring with current linux instructions (ubuntu)
- stopped using the "reserved" files, the "input" files are good enough
to get some space freed.
-
This is an emergency workaround for authenticated repos
if the user has lost the borg key.
We can't compute the TAM key without the borg key, so just
skip all the TAM stuff.
A borgbackup-2.0.0b6 test fails on OpenBSD with the message below.
```
=================================== FAILURES ===================================
_____________________________ test_get_runtime_dir _____________________________
path = '/run/user/55/borg', mode = 511, pretty_deadly = True
def ensure_dir(path, mode=stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO, pretty_deadly=True):
"""
Ensures that the dir exists with the right permissions.
1) Make sure the directory exists in a race-free operation
2) If mode is not None and the directory has been created, give the right
permissions to the leaf directory. The current umask value is masked out first.
3) If pretty_deadly is True, catch exceptions, reraise them with a pretty
message.
Returns if the directory has been created and has the right permissions,
An exception otherwise. If a deadly exception happened it is reraised.
"""
try:
> os.makedirs(path, mode=mode, exist_ok=True)
build/lib.openbsd-7.3-amd64-cpython-310/borg/helpers/fs.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
```
If `$XDG_RUNTIME_DIR` is not set `platformdirs.user_runtime_dir()`
returns one of 3 different paths
(https://github.com/platformdirs/platformdirs/pull/201). Proposed fix is
to check if `get_runtime_dir()` returns one of these paths.
last coala release (0.11.0) is now over 6y old.
when using pip install coala, a ton of stuff gets installed (expected)
and a part of that downgrades some stuff we use to outdated, incompatible
versions.
when trying to run coala with python 3.11, it just crashes because the
last release was made for py35/py36 (as seen in their setup.py).
a lot of PRs and tickets pile up at the coala project on github,
but noone is maintaining it.
macFUSE supports a volname mount option to give what
finder displays on desktop / in directory list.
if the user did not specify it, we make something up,
because otherwise it would be "macFUSE Volume 0 (Python)".
Move the explanation below the general explanation of the `--keep-*` option
behavior rephrase the last sentence to make it clear that it works like the
other options that were explained in the previous paragraph.
Resolves#7687
- pattern needs to start with + - !
- first match wins
- the default is to list everything, thus a 2nd pattern
is needed to exclude everything not matched by 1st pattern.
about 10-50% of the github windows CI runs fail due to
this - root cause unknown.
Example failure:
# we first check if we could create a sparse input file:
sparse_support = is_sparse(filename, total_size, hole_size)
if sparse_support:
# we could create a sparse input file, so creating a backup of it and
# extracting it again (as sparse) should also work:
self.cmd(f"--repo={self.repository_location}", "rcreate", RK_ENCRYPTION)
self.cmd(f"--repo={self.repository_location}", "create", "test", "input")
with changedir(self.output_path):
self.cmd(f"--repo={self.repository_location}", "extract", "test", "--sparse")
self.assert_dirs_equal("input", "output/input")
filename = os.path.join(self.output_path, "input", "sparse")
with open(filename, "rb") as fd:
# check if file contents are as expected
> self.assert_equal(fd.read(hole_size), b"\0" * hole_size)
E AssertionError: b'\x0[8388602 chars]x00\xf0Y\xb5\xe3\xee\xf3\x1f\xe3L\xcf\xae\x92\[159253621 chars]\x00' != b'\x0[8388602 chars]x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0[159383505 chars]\x00'
src/borg/testsuite/archiver/extract_cmd.py:212: AssertionError
Replacing the internals should make the implementation faster
and simpler since the order tracking is done by the `OrderedDict`.
Furthermore, this commit adds type hints to `LRUCache` and
renames the `upd` method to `replace` to make its use more clear.
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:
$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root root 7 Sun, 2022-10-23 19:14:27 x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'
Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.
This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.
With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.
I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:
a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.
Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.
This also (partially) fixes#7099.
shutting down logging is problematic as it is global
and we do multi-threaded execution, e.g. in tests.
thus, rather just flush the important loggers and keep
them alive.
server (listening) side:
borg serve --socket # default location
borg serve --socket=/path/to/socket
client side:
borg -r socket:///path/to/repo create ...
borg --socket=/path/to/socket -r socket:///path/to/repo ...
served connections:
- for ssh: proto: one connection
- for socket: proto: many connections (one after the other)
The socket has user and group permissions (770).
skip socket tests on win32, they hang infinitely, until
github CI terminates them after 60 minutes.
socket tests: use unique socket name
don't use the standard / default socket name, otherwise tests
running in parallel would interfere with each other by using
the same socket / the same borg serve process.
write a .pid file, clean up .pid and .sock file at exit
add stderr print for accepted/finished socket connection
- tears down logging (so no new log output is generated afterwards)
- sends all queued log output
- then returns
also: make stdin_fd / stdout_fd instance variables
for normal borg command invocation:
- logging is set up in Archiver.run
- the atexit handler calls logging.shutdown when process terminates
for tests:
- Archiver.run called by exec_cmd
- no atexit handler executed as process lives on
- borg.logger.teardown (calls shutdown and configured=False) now
called in exec_cmd
- simplify progress output (no \r, no terminal size related tweaks)
- emit progress output via the logging system (so it does not use stderr
of borg serve)
- progress code always logs a json string, the json has all needed
to either do json log output or plain text log output.
- use formatters to generate plain or json output from that.
- clean up setup_logging
- use a StderrHandler that always uses the **current** sys.stderr
- tweak TestPassphrase to not accidentally trigger just because of seeing 12 in output
Instead, install a handler that sends the LogRecord dicts to a queue.
That queue is then emptied in the borg serve main loop and
the LogRecords are sent msgpacked via stdout to the client,
similar to the RPC results.
On the client side, the LogRecords are recreated from the
received dicts and fed into the clientside logging system.
As we use msgpacked LogRecord dicts, we don't need JSON for
this purpose on the borg serve side any more.
On the client side, the LogRecords will then be either formatted
as normal text or as JSON log output (by the clientside log
formatter).
Compact moves data to new segments, and then removes the old segments.
When enough segments are moved, directories holding the now cleared segments
may thus become empty.
With this commit any empty directories are cleared after segments compacting.
Fixes#6823
+ os.scandir instead of os.listdir
Improved speed and added flexibility with attributes (name, path, is_dir(), is_file())
+ use is_dir / is_file to make sure we're reading only dirs / files respectively
+ Filtering to particular start, end index range built in
+ Move value bounds of segment (index) into constants module and use them instead
Resolves#7597
(forward patch from commits c9f35a16e9bf9e7073c486553177cef79ff1cb06^..edb5e749f512b7737b6933e13b7e61fefcd17bcb)
this used to call get_base_dir (and would have needed
legacy=True now to work like expected).
rather implemented the desired behaviour locally and
got rid of the legacy call (which was a bit strange
anyway as it also considered BORG_BASE_DIR, which is
unexpected when resolving ~).
in the sysinfo function, there is a way to suppress
all sysinfo output via an env var and just return an
empty string.
so we can expect it is always in unpacked, but it
might be the empty string.
log output:
always expect json, remove $LOG format support.
we keep limited support for unstructured format also,
just not to lose anything from remote stderr.
rpc format:
ancient borg used tuples in the rpc protocol,
but recent ones use easier-to-work-with dicts.
version info:
we expect dicts with server/client version now.
That means I won't make new 1.1.x releases.
In case there would be a major security or other issue,
I might still make a fix commit to the 1.1-maint branch,
where dist package maintainers or other interested
parties could find it.
this ports change 73ee704afa to master.
setUp enters the context manager, so let's .reopen() leave it.
then create a fresh Repository instance in self.repository and
enter the context manager again. tearDown then will leave that.
"if self.repository" did not work as expected:
- Repository has a __len__ method, so the boolean evaluation was calling that.
- self.repository is also not set to None anywhere.
while on macOS the new and old security dir location is the same path,
this is not the case on e.g. Linux, it could move from .config/borg/security to
.local/share/borg/security .
See #5760.
at some places, the docs were not updated yet.
for borg 1.x, -a (aka --glob-archives) expected
sh: style glob patterns ONLY (but one must not
give sh: explicitly).
for borg 2, -a (aka --match-archives) defaults
to id: style (identical match), so one must give
sh: if one wants shell-style globbing.
not needed for borg2 repos (we derive a new session key for each borg
invocation and start counting from 0).
also not needed for borg 1.x repos because we only read them (borg transfer)
and won't write new encrypted data to them.
use this to only list the kept (or pruned) archives.
--list-pruned and --list-kept also work in a additive way.
implied logging: support multiple prune options activating same logger
if any of --list / --list-kept / --list-pruned is used,
it should put the borg.output.list logger to INFO level,
otherwise to WARN level.
as a first step, i moved all the traceback formatting
to format_tb.
also, it **first** prints the error and then the traceback
as additional information for a bug report, as suggested
by @jimparis in that ticket.
saying "must be a writable directory" can distract
from the real root cause as seen in #7496.
so we better first check if the mountpoint is an
existing directory and if not, just tell that.
after that, we check permissions and if they are not
like required, tell that.
fix config dir compatibility issue, fixes#7445
- add tests
- make sure the result of get_cache_dir matches pre and post #7300 where desired
- harmonize implementation of config_dir_compat and cache_dir_compat tests
Co-authored-by: nain <126972030+F49FF806@users.noreply.github.com>
this needs to decompress and to hash the chunk data,
but better let's play safe.
at least we still can avoid the (re-)compression with
borg transfer (which is often much more expensive
than decompression).
The "Building a development environment" section links to the
"Using git" section. This can result in developers overseeing
the os dependencies necessity.
re #7356
diff: include changes in ctime and mtime, fixes#7248
also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes
Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
this is an incompatible change:
before:
borg debug put-obj path1 path2 ...
(and borg computed all IDs automatically) (*)
after:
borg debug put-obj id path
(id must be given)
(*) the code just using sha256(data) was outdated and incorrect anyway.
also: debug get-obj: improve error handling
Errors handled for backup src files:
- BackupOSError (converted from OSError), e.g. I/O Error
- BackupError (stats race, file changed while we backed it up)
Error Handling:
- retry the same file after some sleep time
- sleep time starts from 1ms, increases exponentially up to 10s
- 10 tries
If retrying does not help:
- BackupOSError: skip the file, log it with "E" status
- BackupError: last try will back it up, log it with "C" status
Works for:
- borg create's normal (builtin) fs recursion
- borg create --paths-from-command
- borg create --paths-from-stdin
Notes:
- update stats.files_stats late (so we don't get wrong
stats in case of e.g. IOErrors while reading the file).
- _process_any: no changes to the big block, just indented
for adding the retry loop and the try/except.
- test_create_erroneous_file succeeds because we retry the file.
we do book-keeping in item.chunks:
in case something goes wrong and we need to clean up,
we will have a list with chunks to decref in item.chunks.
also:
- make variable naming more consistent
- cosmetic changes
if a file can't be read (like here: there is a simulated
I/O error in the 2nd chunk of file2), it should be logged
with "E" status, skipped and backup shall proceed with
next file(s).
also, check that the repo has no orphan chunks (exception
handling code needs to deal with 1st chunk of file2 which
already has been written / incref'd in the repo).
--chunker-params=fail,4096,rrrEErrrr means:
- cut chunks of 4096b fixed size (last chunk in a file can be less)
- read chunks 0, 1 and 2 successfully
- error at chunk 3 and 4 (simulated OSError(errno.EIO))
- read successfully again for the next 4 chunks
Chunks are counted inside the chunker instance, starting
from 0, always increasing while the same instance is used.
Read chunks as well as failed chunks count up by 1.
also add a test: recreate without --chunker-params shall not rechunk
before the fix, it triggered rechunking if an archive
was created with non-default chunker params.
but it only should rechunk if borg recreate is invoked with explicitly giving --chunker-params=....
test hashtable expansion/rebuild.
hashindex_lookup:
- return -2 for a compact / completely full hashtable
- return -1 and (via start_idx pointer) the deleted/tombstone bucket index.
fix size assertion (we add 1 element to trigger rebuild)
fix upper_limit check - since we'll be adding 1 to num_entries below,
the condition should be >=:
hashindex_compact: set min_empty/upper_limit
Co-authored-by: Dan Christensen <jdc+github@uwo.ca>
hashindex_index returns the perfect hashtable index, but does not
check what's in the bucket there, so we had these loops afterwards
to search for an empty or deleted bucket.
problem: if the HT were completely filled with no empty and no deleted
buckets, that loop would never end. due to our HT resizing, it can
never happen, but still not pretty.
when using hashindex_lookup (as also used some lines above), the code
is easier to understand, because (after we resized the HT), we freshly
create the same situation as after the first call of that function:
- return value < 0, because we (still) can not find the key
- start_idx will point to an empty bucket
Thus, we do not need the problematic loops we had there.
Modified the checks to make sure we really have an empty or deleted
bucket before overwriting it with data.
Added some additional asserts to make sure the code behaves.
we don't want to suddenly/unexpectedly break stuff for borg users
just because platformdirs does a breaking release.
at platformdirs 2.0.0 macOS config dir changed.
at platformdirs 3.0.0 macOS config dir changed again.
at platformdirs 4.0.0 (future) - who knows?
if we run into some issue reading an input file, e.g. an I/O error,
the BackupOSError exception raised due to that will skip the current
file and no archive item will be created for this file.
But we maybe have already added some of its content chunks to the repo,
we have either written them as new chunks or incref'd some identical chunk
in the repo.
Added an exception handler that decrefs (and deletes if refcount reaches 0)
these chunks again before re-raising the exception, so the repo is in a
consistent state again and we do not have orphaned content chunks in the repo.
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.
people can recognize via the file name it is a partial file.
nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
checkpoint archives might have a single, incomplete part file as last item.
part files are always a prefix of the full file, growing in size from
checkpoint to checkpoint.
we now manage the archive items metadata stream in a special way:
- checkpoint archive A(n) might end with a partial item PI(n)
- checkpoint archive A(n+1) does not contain PI(n)
- checkpoint archive A(n+1) contains a new partial item PI(n+1)
- the final archive does not contain any partial items
not having this had created orphaned item_ptrs chunks for checkpoint archives.
also:
- borg check: show id of orphaned chunks
- borg check: archive list with explicit consider_checkpoints=True (this is the default, but better make sure).
check --archives: add --newer/--older/--newest/--oldest, fixes#7062
Options accept a timespan, like Nd for N days or Nm for N months.
Use these to do date-based matching on archives and only check some of them,
like: borg check --archives --newer=1m --newest=7d
Author: Michael Deyaso <mdeyaso@fusioniq.io>
Same change for .recreate_cmdline -> .recreate_command_line .
JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]
if they are present, process them through json_text().
this replaces s-e by "?" for the key and puts the binary
representation into key_b64, if needed.
likely this is rarely needed.
item: path, source, user, group
for non-unicode stuff borg 1.2 had "bpath".
now we have:
path - unicode approximation (invalid stuff replaced by ?)
path_b64 - base64(path_bytes) # only if needed
source has the same issue as path and is now covered also.
user and group are usually unicode or even pure ASCII,
but we rather are cautious and cover them also.
binary bytes:
- json_key = <key>_b64
- json_value == base64(value)
text (potentially with surrogate escapes):
- json_key1 = <key>
- json_value1 = value_text (s-e replaced by ?)
- json_key2 = <key>_b64
- json_value2 = base64(value_binary)
json_key2/_value2 is only present if value_text required
replacement of surrogate escapes (and thus does not represent
the original value, but just an approximation).
value_binary then gives the original bytes value (e.g. a
non-utf8 bytes sequence).
using "differenthost" (== not the current hostname) makes
the process_alive check always return True (to play safe,
because in can not check for processes on other hosts).
python's io.BufferedWriter sizes its buffer based on st_blksize.
If the write fits in this buffer, then it's possible the data from
idx.write() has not been flushed through to ,the underlying filesystem,
and getsize(fileno()) sees a too-short (or even empty) file.
Also, getsize is only documented as accepting path-like objects;
passing a fileno seems to work only because the implementation
blindly forwards everything through to os.stat without checking.
Passing unopened_tempfile avoids all three problems
- on windows, it doesn't rely on re-opening NamedTemporaryFile
(the issue which led to cc0ad321dc)
- we're following the documented API of getsize(path-like)
- the file is closed (thus flushed) inside idx.write, before getsize()
One cannot "to not x", but one can "not to x".
Avoiding split infinitives gives the added bonus that machine
translation yields better results.
setup (n/adj) vs set(v) up. We don't "I setup it" but "I set it up".
Likewise for login(n/adj) and log(v) in, backup(n/adj) and back(v) up.
\n is automatically converted on write to the platform-dependent os.linesep.
Using os.linesep instead of \n means that on Windows, the line ending becomes "\r\r\n".
Also switches mentions of {LF} to {NL} in code and docs.
On Windows, the ":" character cannot be used in a filename.
Python does not error on this because the ":" character represents data streams.
See https://stackoverflow.com/a/54508979
strange: on macOS, the globally set PKG_CONFIG_PATH was overwritten,
thus the borg build did not find openssl any more. setting it here
locally again works around the issue.
this option did not change behaviour since longer,
we only had kept it for API compatibility.
as a borg2 repo server won't have old clients talking to it,
we can safely remove this everywhere now.
Without the status being set no output was generated in
dry-run mode, confusing users about whether borg would back
up directories (in non-dry-run mode).
- == item not backed up just because of dry-run mode
x == item excluded
we want to be able to use an archive name as a directory name,
e.g. for the FUSE fs built by borg mount.
thus we can not allow "/" in an archive name on linux.
on windows, the rules are more restrictive, disallowing
quite some more characters (':<>"|*?' plus some more).
we do not have FUSE fs / borg mount on windows yet, but
we better avoid any issues.
we can not avoid ":" though, as our {now} placeholder
generates ISO-8601 timestamps, including ":" chars.
also, we do not want to have leading/trailing blanks in
archive names, neither surrogate-escapes.
control chars are disallowed also, including chr(0).
we have python str here, thus chr(0) is not expected in there
(is not used to terminate a string, like it is in C).
the UNIX time used for timestamp is seconds since 1.1.1970,
in UTC. thus, the natural way to represent it is with a
tz-aware utc datetime object.
but previously (in borg 1.x), they used naive datetime
objects and localtime.
looks like that chmod should only get done IF we are root (and on linux?).
taking away write permissions on windows/cygwin (and when running as normal
user) makes create_regular_file fail when it tries to create dir2/file3.
argparse: the default action is "store" and that overwrote an already
existing list in args.paths (e.g. from --pattern="R someroot") when it
started to process the positional PATH args.
with "extend" it now extends the existing args.paths with the list of
positional PATH arguments (which can be 0..N elements long, nargs="*").
note: "extend" is new since python 3.8, thus this can only be backported
to 1.2-maint, but not to 1.1-maint.
- file status A/M/E counters
- chunking time
- hashing time
- rx_bytes / tx_bytes
Note: the sleep() in the test is needed due to timestamp granularity on linux being much more coarse than expected (uses the system timer, 100Hz or 250Hz).
support reading new, improved hashindex header format, fixes#6960
Bit of a pain to work with that code:
- C code
- needs to still be able to read the old hashindex file format,
- while also supporting the new file format.
- the hash computed while reading the file causes additional problems because
it expects all places in the file get read exactly once and in sequential order.
I solved this by separately opening the file in the python part of the code and
checking for the magic.
BORG_IDX means the legacy file format and legacy layout of the hashtable,
BORG2IDX means the new file format and the new layout of the hashtable.
Done:
- added a version int32 directly after the magic and set it to 2 (like borg 2).
the old header had no version info, but could be denoted as version 1 in case
we ever need it (currently it decides based on the magic).
- added num_empty as indicated by a TODO in count_empty, so it does not need a
full hashtable scan to determine the amount of empty buckets.
- to keep it simpler, I just filled the HashHeader struct with a
`char reserved[1024 - 32];`
1024 being the desired overall header size and 32 being the currently used size.
this alignment might be useful in case we mmap() the hashindex file one day.
warning: src/borg/item.pyx:199:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
warning: src/borg/item.pyx:200:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
warning: src/borg/item.pyx:202:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
this turns all python level classes into extension type classes.
additionally it turns the indirect properties into direct descriptors.
test_propdict_attributes runs about 30% faster.
base memory usage as reported by sys.getsizeof(Item()):
before: 48 bytes, after this PR: 40 bytes
Author: @RonnyPfannschmidt in PR #5763
reads all chunks in on-disk order and recompresses them if they are not already using
the desired compression type and level (and obfuscation level).
supports SIGINT/ctrl-c and --checkpoint-interval (default: 1800s).
this is a borg command that compacts when committing (without this, it would have
a huge space usage). it commits/compacts every checkpoint interval or when
pressing ctrl-c / receiving SIGINT.
we should modify the meta dict given by the caller, so the caller can know
about e.g. the compression/obfuscation that was done (this is useful for rcompress).
some new stuff is not supported for NSIndex1,
but we can avoid crashing due to function signature mismatches or
missing methods and rather have more clear exceptions.
when using .scan(limit, marker), we used to use the last chunkid from
the previously returned scan result to remember how far we got and
from where we need to continue.
as this approach used the repo index to look up the respective segment/offset,
it was problematic if the code using scan was re-writing the chunk to
a new segment/offset, updating the repo index (e.g. when recompressing a chunk)
and basically destroying the memory about from where we need to continue
scanning.
thus, directly returning (segment, offset) as marker is easier and solves this issue.
otherwise, if we scan+get+put (e.g. if we read/modify/write chunks to
recompress them), it would scan past the last commit and run into the
newly written chunks (and potentially never terminate).
that would require setuptools_scm>=5.0.0 but some dists do not have that yet.
also, we do not use the version_tuple from _version.py, so it is not required anyway.
forward port of #7024.
the intention of this test is testing whether borg check
returns an error when checking a corrupted repository.
the removed assertions were rather testing the test logging
configuration, which seems flaky:
- when running all tests, assertions failed
- when running only this one test, assertions succeeded
- assertions also succeeded when running all the tests before
they were refactored to separate test modules, although the
test code was not changed, just moved.
looks like rhel7 and co is still supported and needs the old glibc.
debian stretch is not supported any more by debian, so the binaries
created on this are provided on a "use on your own risk" basis.
reverts fc67453bf3
legacy: add/remove ctype/clevel bytes prefix of compressed data
new: use a separate metadata dict
compressors: use an int as ID, not a len 1 bytestring
borg < 2:
obj = encrypted(compressed(data))
borg 2:
obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))
handle compr / decompr in repoobj
move the assert_id call from decrypt to RepoObj.parse
also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
- timezone aware timestamps
- str representation with +HHMM or +HH:MM
- get rid of to_locatime
- fix with_timestamp
- have archive start/end time always in local time with tz or as given
- idea: do not lose tz information
then we know when a backup was made and even from
which timezone it was made. if we want to compute
utc, we can do that using these infos.
this makes a quite nice archives list, with timestamps
as expected (in local time with timezone info).
at some places we just enforce utc, like for the
repo manifest timestamp or for the transaction log,
these are usually not looked at by the user.
since python 3.7, .isoformat() is usable IF timespec != "auto"
is given ("auto" [default] would be as evil as before, sometimes
formatting with, sometimes without microseconds).
also since python 3.7, there is now .fromisoformat().
There are some other places with subprocesses:
- borg create --content-from-command
- borg create --paths-from-command
- (de)compression filter process of import-tar / export-tar
implemented by introducing one level of indirection, the limit is now
very high, so it is not practically relevant any more.
we always use the indirection (storing the metadata stream chunk ids list not
directly into the archive item, but into some repo objects referenced by the new
ArchiveItem.item_ptrs list).
thus, the code behaves the same for all archive sizes.
work around setuptools puking about:
############################
# Package would be ignored #
############################
Python recognizes 'borg.cache_sync' as an importable package,
but it is not listed in the `packages` configuration of setuptools.
'borg.cache_sync' has been automatically added to the distribution only
because it may contain data files, but this behavior is likely to change
in future versions of setuptools (and therefore is considered deprecated).
Please make sure that 'borg.cache_sync' is included as a package by using
the `packages` configuration field or the proper discovery methods
(for example by using `find_namespace_packages(...)`/`find_namespace:`
instead of `find_packages(...)`/`find:`).
You can read more about "package discovery" and "data files" on setuptools
documentation page.
hopefully this is the final fix.
after first fixing of #6400 (by using os.umask after mkstemp), there
was a new problem that chmod was not supported on some fs.
even after fixing that, there were other issues, see the ACLs issue
documented in #6933.
the root cause of all this is tempfile.mkstemp internally using a
very secure, but hardcoded and for our use case problematic mode
of 0o600.
mkstemp_mode (mosty copy&paste from python stdlib tempfile module +
"black" formatting applied) supports giving the mode via the api,
that is the only change needed.
slightly dirty due to the _xxx imports from tempfile, but hopefully
this will be supported in some future python version.
Since compression type identification has been split into type and
level, the graphic needed a slight update.
Unfortunately, I don't have access to Visio, so I converted this to odg.
While writing my own out-of-band decoder, I had a hard time figuring out
how to unpack the manifest. From the description, I was only able to
read that the manifest is msgpack'd, but I had not been able to figure
out that it's also going through the same encryption+compression logic
as all other things do.
This should make it a little clearer and provide the necessary
information to understand how the compression works.
manifest, repo and cache are committed every checkpoint interval.
also, when ctrl-c is pressed, finish deleting the current archive, commit and then terminate.
the old code did just 1 attempt to detect the repo decryption key.
if the first chunkid we got from the chunks hashtable iterator was accidentally
the id of the chunk we intentionally corrupted in test_delete_double_force,
setup of the key failed and that made the test crash.
in practice, this could of course also happen if chunks are corrupted, thus
we now do many retries with other chunks before giving up.
error handling was improved: do not return None (instead of a key), it just
leads to weird crashes elsewhere, but fail early with IntegrityError and a
reasonable error msg.
rename method to make_key to avoid confusion with borg.crypto.key.identify_key.
also added some .pyi files needed to check the cython code (taken from #5703 and updated).
fixed "syntax error" in key.py.
all mypy complaints not fixed yet.
borg2's new repo format does not need computing crc32 over big amounts of
(content) data any more (we now use xxh64 for that).
thus, having a quick crc32 implementation via libdeflate is not important
enough any more to rectify having libdeflate as a requirement.
in the finished == true message, these are missing:
- message
- current / total
- info
This is to be somewhat consistent with #6683 by only providing a
minimal set of values for the finished case.
The finished messages is primarily intended for cleanup purposes,
e.g. clearing the progress display.
there was no way to tell the repository version for a remote repo.
borg 2 needs that to reject doing most operations with an old repo,
except the stuff needed for borg transfer.
These are legacy crypto modes based on AES-CTR mode:
(repokey|keyfile)[-blake2]
New crypto modes with session keys and AEAD ciphers:
(repokey|keyfile)[-blake2]-(aes-ocb|chacha20-poly1305)
Tests needed some changes:
- most used repokey/keyfile, changed to new modes
- some nonce tests removed, the new crypto code does not generate
the repo side nonces any more (were only used for AES-CTR)
v2 is the default repo version for borg 2.0.
v1 repos must only be used in a read-only way, e.g. for
--other-repo=V1_REPO with borg init and borg transfer!
This is to support general-purpose transfer of archives between related
borg2 repos.
To transfer (and convert) archives from borg 1.2 repos, users need to
give --upgrader=From12To20 .
this fixes a strange test failure that did not happen until now:
it could not read the MAGIC bytes from a (quite new) segment file,
it just returned the empty string.
maybe its appearance is related to the removed I/O calls.
This saves some segment file random IO that was previously necessary
just to determine the size of to be deleted data.
Keep old one as NSIndex1 for old borg compatibility.
Choose NSIndex or NSIndex1 based on repo index layout from HashHeader.
for an old repo index repo.get(key) returns segment, offset, None, None
if a hardlink copy of a repo was made and a new repo config
shall be saved, do NOT fill in random garbage before deleting
the previous repo config, because that would damage the hardlink
copy.
Item.xattrs is now always a StableDict mapping bytes keys -> bytes values.
The special casing of empty values (b'') getting replaced by None was removed.
see ticket and borg.helpers.msgpack docstring.
this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).
still needed compat to the past is done via want_bytes decoder in borg.item.
* make constants for files cache mode more clear
Traditionally, DEFAULT_FILES_CACHE_MODE_UI and DEFAULT_FILES_CACHE_MODE
were - as the naming scheme implies - the same setting, one being the UI
representation as given to the --files-cache command line option and the
other being the same default value in the internal representation.
It happended that the actual value used in borg create always comes from
DEFAULT_FILES_CACHE_MODE_UI (because that does have the --files-cache
option) whereas for all other commands (that do not use the files cache) it
comes from DEFAULT_FILES_CACHE_MODE.
PR #5777 then abused this fact to implement the optimisation to skip loading
of the files cache in those other commands by changing the value of
DEFAULT_FILES_CACHE_MODE to disabled.
This however also changes the meaning of that variable and thus redesignates
it to something not matching the original naming anymore.
Anyone not aware of this change and the intention behind it looking at the
code would have a hard time figuring this out and be easily mislead.
This does away with the confusion making the code more maintainable by
renaming DEFAULT_FILES_CACHE_MODE to FILES_CACHE_MODE_DISABLED, making the
new intention of that internal default clear.
* make constant for files cache mode UI default match naming scheme
This not only brings code style in line with the other helpers that do the
same thing this way, but also does away with an unnecessary absolute import
using the borg module name explicitly.
borg now has the chunks list in every item with content.
due to the symmetric way how borg now deals with hardlinks using
item.hlid, processing gets much simpler.
but some places where borg deals with other "sources" of hardlinks
still need to do some hardlink management:
borg uses the HardLinkManager there now (which is not much more
than a dict, but keeps documentation at one place and avoids some
code duplication we had before).
item.hlid is computed via hardlink_id function.
support hardlinked symlinks, fixes#2379
as we use item.hlid now to group hardlinks together,
there is no conflict with the item.source usage for
symlink targets any more.
2nd+ hardlinks now add to the files count as did the 1st one.
for borg, now all hardlinks are created equal.
so any hardlink item with chunks now adds to the "file" count.
ItemFormatter: support {hlid} instead of {source} for hardlinks
Item.hlid: same id, same hardlink (xxh64 digest)
Item.hardlink_master: not used for new archives any more
Item.source: not used for hardlink slaves any more
this is somehow similar to borg recreate,
but with different focus and way simpler:
not changing compression algo
not changing chunking
not excluding files inside an archive by path match
only dealing with complete archives
but:
different src and dst repo
only reading each chunk once
keeping the compressed payload (no decompression/recompression effort)
--dry-run can be used before and afterwards to check
it does not make sense to request versions view if you only
look at 1 archive, but the code shall not crash in that case
as it did, but give a clear error msg.
the check only considered old key -> new key changes, but
new key to new key is of course also fine.
e.g. repokey-aes-ocb -> repokey-aes-ocb (both use hmac-sha256
as id hash)
the id must now always be given correctly because
the AEAD crypto modes authenticate the chunk id.
the special case when id == MANIFEST_ID is now handled
inside assert_id, so we never need to give a None id.
it potentially will ask for the passphrase for the key of OTHERREPO.
for the newly created repo, it will use the same passphrase.
it will copy: enc_key, enc_hmac_key, id_key, chunker_seed.
keeping the id_key (and id algorithm) and the chunker seed (and chunker
algorithm and parameters) is desirable for deduplication.
the id algorithm is usually either HMAC-SHA256 or BLAKE2b.
keeping the enc_key / enc_hmac_key must be implemented carefully:
A) AES-CTR -> AES-CTR is INSECURE due to nonce reuse, thus not allowed.
B) AES-CTR -> AEAD with session keys is secure.
C) AEAD with session keys -> AEAD with session keys is secure.
AEAD modes with session keys: AES-OCB and CHACHA20-POLY1305.
all-zero chunks are propagated as:
CH_ALLOC, data=None, size=len(zeros)
other chunks are:
CH_DATA, data=data, size=len(data)
also: remove the comment with the wrong assumption
this is similar to #4777.
borg check must not crash if an archive metadata block does not decrypt.
Instead, report the archive_id, remove the archive from the manifest and skip to the next archive.
selftest imports testsuite.crypto
I did not realise this and imported pytest from testsuite.crypto
This broke the selftest.
Solution: move the tests that depend on pytest to testsuite.key.
All three affected tests are tests for the Key classes, so
this is probably a better plase for them anyway.
when migrating from repokey to keyfile, we just store an empty key into the repo config,
because we do not have a "delete key" RPC api. thus, empty key means "there is no key".
here we fix load_key, so that it does not behave differently for no key and empty key:
in both cases, it just returns an empty value.
additionally, we strip the value we get from the config, so whitespace does not matter.
All callers now check for the repokey not being empty, otherwise RepoKeyNotFoundError
is raised.
for now, this code shall only work on v2 repos (created by this code).
the code to read v1 repos is still present though, so for experiments,
it is possible to change the repo version in the repo config from 1 to
2 manually.
having version 2 in the repo config also avoids that borg < 1.3 is
used on such a repo, which would cause damage:
old borg would not recognize the PUT2 tagged segment entries and
old borg check --repair would likely kill them all due to that.
also: keep repo version in Repository.version
note: this required a slight increase of MAX_OBJECT_SIZE so that MAX_DATA_SIZE
could stay the same as before.
For PUT2, compute the hash over the whole entry (header and content, excluding
hash and crc32 fields, because the crc32 computation includes the hash).
Also: refactor crc32 checks into function, use f-strings, structure _read in
a more logical sequential order.
write_put: avoid creating a large temporary bytes object
why use xxh64?
- fast even without hw acceleration
- borg depends on it already anyway
- stronger than crc32 and strong enough for this purpose
Argon2 the second part: implement encryption/decryption of argon2 keys
borg init --key-algorithm=argon2 (new default, older pbkdf2 also still available)
borg key change-passphrase: keep key algorithm the same
borg key change-location: keep key algorithm the same
use env var BORG_TESTONLY_WEAKEN_KDF=1 to resource limit (cpu, memory, ...) the kdf when running the automated tests.
OpenBSD does not have `lchmod()` causing `os.lchmod` to be unavailable
on this platform. As a result ArchiverTestCase::test_basic_functionality
fails when run manually (#2055).
OpenBSD does have `fchmodat()`, which has a flag that makes it behave
like `lchmod()`. In Python this can be used via `os.chmod(path, mode,
follow_symlinks=False)`.
As of Python 3.3 `os.lchmod(path, mode)` is equivalent to
`os.chmod(path, mode, follow_symlinks=False)`. As such, switching to the
latter is preferred as it enables more platforms to do the right thing.
although bug #6526 did not show with ssh style URLs, we should
not have different regexes for the host part for ssh and scp style.
thus i extracted the host_re from both and also cleaned up a bit.
added a negative lookahead/lookbehind to make sure an ipv6 addr
(enclosed in square brackets) does not get badly matched by the
regex part intended for hostnames and ipv4 addrs only.
the other part of that regex which is actually intended to match
ipv6 addrs only matches if they are enclosed in square brackets.
also added tests for ssh and scp style repo URLs with ipv6 addrs
in brackets.
also: made regex more readable, putting these 2 cases on separate lines.
The previous sample for creating a ~/.borg-passphrase file creates it first and then chmod's it to 400 permissions. That's probably fine in practice, but means there's a tiny window where the passphrase file is sitting with default permissions (likely world readable, depending on the system umask).
It seems safer to first change the umask to remove all group & world bits (0077) _before_ creating the file. To be polite and avoid messing with the user's previous umask, we do this in a subshell. (Note that umask 0077 leads to a mode of 600 rather than the previous 400, because removing the owner write bit doesn't seem to buy much since the owner can just chmod the file anyway.)
export-tar: just msgpack and b64encode all item metadata and
put that into a BORG specific PAX header.
this is *additional* to the standard tar metadata.
import-tar: when detecting the BORG specific PAX header, just get
all metadata from there (and ignore the standard tar
metadata).
--tar-format=GNU|PAX (default: GNU)
changed the tests which use GNU tar cli tool to use --tar-format=GNU
explicitly, so they don't break in case we change the default.
atime timestamp is only present in output if the archive item has it
(which is not the case by default, needs "borg create --atime ...").
if LZ4/ZSTD.decompress gets called with a memoryview idata, keep
it until after the super().decompress(idata) call, so we save one
copy operation just to remove the 2 bytes long compression type
header.
attic is borg's parent project, but it stalled in 2015 and was not updated since then.
guess we can assume that most attic users have meanwhile noticed this and already
converted their repos to borg.
if some did not yet, they are advised to use borg < 1.3 to do that ASAP.
note: borg can still DETECT an attic repo by recognizing its ATTIC_MAGIC value
and then gives exactly that advice.
Code gets simpler if we always only use the (shorter) header_fmt.
That format ALWAYS applies, to all tags borg writes.
If the tag unpacked from there indicates that there is also a chunkid
to read (like for PUT and DEL), we can decide that inside _read and
then read the chunkid from the fd.
olen is assigned by OpenSSL, but the compiler can't know that and generates these warnings:
warning: src/borg/crypto/low_level.pyx:271:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:274:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:314:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:317:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:514:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:517:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:566:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:572:22: local variable 'olen' referenced before assignment
added it for all classes there, so the caller just give it.
for the legacy AES-CTR based classes, the given aad is completely ignored.
this is to stay compatible with repo data of borg < 1.3.
for the new AEAD based classes:
encrypt: the aad is fed into the auth tag computation
decrypt: same. decrypt will fail on auth tag mismatch.
we already have .decrypt(id, data, ...).
i changed .encrypt(chunk) to .encrypt(id, data).
the old borg crypto won't really need or use the id,
but the new AEAD crypto will authenticate the id in future.
if we just have a pointer to a bytes object which might go out of scope, we can lose it.
also: cython can directly assign a bytes object into a same-size char array.
if we just have a pointer to a bytes object which might go out of scope, we can lose it.
also: cython can directly assign a bytes object into a same-size char array.
encrypt used to "patch" the IV into the header,
decrypt used to fetch it from there.
encrypt now takes the header just "as is" and
also decrypt expects that the IV is already set.
also:
cleanup class structure: less inheritance, more mixins.
define type bytes using the 4:4 split
upper 4 bits are ciphersuite:
0 == legacy AES-CTR based stuff
1+ == new AEAD stuff
lower 4 bits are keytype:
legacy: a bit mixed up, as it was...
new stuff: 0=keyfile 1=repokey, ...
`borg benchmark cpu` fails on OpenBSD with the error below, which is
caused by LibreSSL currently not supporting AES256_OCB and
CHACHA20_POLY1305.
Work around this by checking if borg is used with LibreSSL. Tested on
OpenBSD.
```
Chunkers =======================================================
buzhash,19,23,21,4095 1GB 14.294s
fixed,1048576 1GB 0.244s
Non-cryptographic checksums / hashes ===========================
crc32 (libdeflate, used) 1GB 0.724s
crc32 (zlib) 1GB 1.953s
xxh64 1GB 0.361s
Cryptographic hashes / MACs ====================================
hmac-sha256 1GB 7.039s
blake2b-256 1GB 9.845s
Encryption =====================================================
aes-256-ctr-hmac-sha256 1GB 18.312s
aes-256-ctr-blake2b 1GB 21.213s
Local Exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 5241, in main
exit_code = archiver.run(args)
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 5172, in run
return set_ec(func(args))
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 607, in do_benchmark_cpu
print(f"{spec:<24} {size:<10} {timeit(func, number=100):.3f}s")
File "/usr/local/lib/python3.9/timeit.py", line 233, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/local/lib/python3.9/timeit.py", line 177, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 602, in <lambda>
("aes-256-ocb", lambda: AES256_OCB(
File "src/borg/crypto/low_level.pyx", line 636, in borg.crypto.low_level.AES256_OCB.__init__
File "src/borg/crypto/low_level.pyx", line 633, in borg.crypto.low_level.AES256_OCB.requirements_check
ValueError: AES OCB is not implemented by LibreSSL (yet?).
Platform: OpenBSD gateway.lan 7.1 GENERIC.MP#418 amd64
Borg: 1.2.1.dev98+gebaf0c32 Python: CPython 3.9.10 msgpack: 1.0.3 fuse: None [pyfuse3,llfuse]
PID: 38614 CWD: /storage/8899fc1454db04de.a/home/code/git/ports/sysutils/borg
sys.argv: ['/usr/local/bin/borg', 'benchmark', 'cpu']
SSH_ORIGINAL_COMMAND: None
```
we tried to be very private / secure here, but that created the issue
that a less secure umask (like e.g. 0o007) just did not work.
to make the umask work, we must start from 0o777 mode and let the
umask do its work, like e.g. 0o777 & ~0o007 --> 0o770.
with borg's default umask of 0o077, it usually ends up being 0o700,
so only permissions for the user (not group, not others).
"passphrase" encryption mode repos can not be created since borg 1.0.
back then, users were advised to switch existing repos of that type
to repokey mode using the "borg key migrate-to-repokey" command.
that command is still available in borg 1.0, 1.1 and 1.2, but not
any more in borg >= 1.3.
while we still might see the PassphraseKey.TYPE byte in old repos,
it is handled by the RepoKey code since borg 1.0.
in the finally-block, we wait for the filter process to die. but it only dies
voluntarily if all data was processed by the filter and it terminates due to EOF.
otoh, if borg has thrown an early exception, e.g. "archive already exists",
we need to kill the filter process to bring it to an early end. in that
case, we also do not need to check the filter rc, because we know we killed it.
looks like with a .tar file created by the tar tool,
tarinfo.mtime is a float [s]. So, after converting to
nanoseconds, we need to cast to int because that's what
Item.mtime wants.
also added a safe_ns() there to clip values to the safe range.
this was used to compare compatibility of our vendored
blake2b code (which we do not have any more) against the
python stdlib blake2b code (which we always use now anyway).
These instances of implicit switch case fallthrough appear to be
intentional. Add comments that the compiler understands to suppress
the false positive warning.
#6338 introduces regression when building with LibreSSL (3.5.0).
```
cc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -O2 -pipe -g -fPIC -O2 -pipe -g -O2 -pipe -g -O2 -pipe -fPIC -Isrc/borg/crypto -I/usr/local/include/python3.9 -c src/borg/crypto/low_level.c -o /tmp/ports/pobj/borgbackup-1.2.1/borg-eec359cf228caf00d9c72bde07bf939872e9d3fa/temp.openbsd-7.1-amd64-3.9/src/borg/crypto/low_level.o
src/borg/crypto/low_level.c:12439:48: error: use of undeclared identifier 'EVP_chacha20_poly1305'; did you mean 'EVP_aead_chacha20_poly1305'?
__pyx_v_self->__pyx_base.__pyx_base.cipher = EVP_chacha20_poly1305;
^~~~~~~~~~~~~~~~~~~~~
EVP_aead_chacha20_poly1305
/usr/include/openssl/evp.h:1161:17: note: 'EVP_aead_chacha20_poly1305' declared here
const EVP_AEAD *EVP_aead_chacha20_poly1305(void);
^
1 error generated.
```
Unfortunately `EVP_aead_chacha20_poly1305`, offered by LibreSSL, is not
a drop in replacement for `EVP_chacha20_poly1305`. More info on the
first can be found at https://man.openbsd.org/EVP_AEAD_CTX_init.3.
The key argument being sent to hashindex_get and hashindex_set by
multiple functions is a different signedness from what the functions
expect. This resolves the issue by changing the key type in the
unpack_user struct to unsigned char.
The value argument of hashindex_set is causing harmless pointer type
mismatches. This resolves the issue by changing the type to void*
which silences these types of warnings.
This resolves a compiler warning from the generated code that
resulted from a comparison of two local variables of different
signedness. The issue is resolved by changing the type of both
to int since this seems like the safest choice available.
All these are unsupported since long.
Newer versions of LibreSSL have gained chacha20-poly1305 support,
but still lack aes256-ocb support.
Also they have the HMAC_CTX_new/free api now.
docs: openssl >= 1.1.1 is required now
anything older is out of support anyway.
The generated source code was producing a compiler warning due to
the pointers differing in constness. The called function expects
a non-const pointer while the generated code produces a const pointer
via a cast. This changes the cast to drop 'const' to make the compiler
happy.
docs: clarify on-disk order and size of log entry fields
The order of the fields of a log entry on disk is CRC32 first, the docs had the size first.
I tried to make this list similar to the HashIndex struct description.
1. **Discuss Changes:** Before starting major work, please discuss your proposed changes on the [GitHub issue tracker](https://github.com/borgbackup/borg/issues). Smaller changes can also be discussed in the comments of the pull request.
2. **Branching Model:** Most Pull Requests should be made against the `master` branch. Maintenance branches (e.g., `1.4-maint`) are generally reserved for bug fixes and smaller changes.
3. **Pull Requests:**
- Create a feature branch for your changes.
- Keep changesets clean and focused on a single topic.
- Reference any related issues in your commit messages.
- Ensure your PR includes tests and documentation for new features.
- Proof read your PR yourself, fix typos and other obvious issues.
## Responsible AI Usage
You are welcome to use AI tools, but we require that a human is always "in the loop".
AI-generated content must not be submitted without active critical review, modification, and integration by the human contributor. We require that the final contribution is a product of human creative control and that AI is only used as a supportive tool to assist the human author.
As the contributor, you are responsible for the entire content of your pull request.
This includes:
- Verifying the correctness and security of any AI-generated code.
- Ensuring that new or modified code is covered by correct tests.
- Proofreading and refining any AI-generated documentation or comments.
- Being able to explain, debug, and maintain the code you submit.
Always be aware of the limitations and the ecological footprint of AI tools and act accordingly:
- Do not just believe what AI tells you, but verify it critically. AI is known to hallucinate, to be over-confident and to always tell you that you are right, even when you are not.
- Do not use AI tools for tasks that can be done more efficiently manually or by simpler tools.
- Learn how to use AI tools efficiently.
## Development Setup
Borg is written in Python with some Cython/C. To set up a development environment:
1. Create and activate a virtual environment.
2. Install development dependencies: `pip install -r requirements.d/development.lock.txt`
3. Install borg in editable mode: `pip install -e .`
4. Install pre-commit hooks: `pre-commit install`
## Code Style
We use [Black](https://black.readthedocs.io/) for automated code formatting.
For more advanced testing options (including Vagrant and Podman), see the full [Development documentation](https://borgbackup.readthedocs.io/en/master/development.html).
## Security
If you discover a security vulnerability, please follow our [Security Policy](SECURITY.md) for reporting it.
ln-s/usr/pkg/lib/python3.8/_sysconfigdata_netbsd9.py/usr/pkg/lib/python3.8/_sysconfigdata__netbsd9_.py# bug in netbsd 9.2, expected filename not there.
ln-s/usr/pkg/lib/python3.9/_sysconfigdata_netbsd9.py/usr/pkg/lib/python3.9/_sysconfigdata__netbsd9_.py# bug in netbsd 9.2, expected filename not there.
Reboot a few times to ensure that the hardware path does not change: on some motherboards
components of it can be random. In these cases you cannot use a more accurate rule,
or need to insert additional stars for matching the path.
ACTION=="add", SUBSYSTEM=="block", ENV{ID_PART_TABLE_UUID}=="<the PTUUID you just noted>", TAG+="systemd", ENV{SYSTEMD_WANTS}+="automatic-backup.service"
The "systemd" tag in conjunction with the SYSTEMD_WANTS environment variable has systemd
launch the "automatic-backup" service, which we will create next, as the
@ -60,8 +47,8 @@ launch the "automatic-backup" service, which we will create next, as the
Type=oneshot
ExecStart=/etc/backups/run.sh
Now, create the main backup script, ``/etc/backups/run.sh``. Below is a template,
modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
Now, create the main backup script, ``/etc/backups/run.sh``. Below is a template;
modify it to suit your needs (e.g., more backup sets, dumping databases, etc.).
..code-block:: bash
@ -107,10 +94,10 @@ modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
echo "Disk $uuid is a backup disk"
partition_path=/dev/disk/by-uuid/$uuid
# Mount filesystem if not already done. This assumes that if something is already
# mounted at $MOUNTPOINT, it is the backup drive. It won't find the drive if
# Mount filesystem if not already done. This assumes that if something is already
# mounted at $MOUNTPOINT, it is the backup drive. It will not find the drive if
# it was mounted somewhere else.
(mount | grep $MOUNTPOINT) || mount $partition_path $MOUNTPOINT
findmnt $MOUNTPOINT >/dev/null || mount $partition_path $MOUNTPOINT
drive=$(lsblk --inverse --noheadings --list --paths --output name $partition_path | head --lines 1)
echo "Drive path: $drive"
@ -119,13 +106,13 @@ modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
for the same reason. Therefore, partial checks may be useful only with very large
repositories where a full check would take too long.
.sp
The \fB\-\-verify\-data\fP option will perform a full integrity verification (as
opposed to checking just the xxh64) of data, which means reading the
data from the repository, decrypting and decompressing it. It is a complete
cryptographic verification and hence very time\-consuming, but will detect any
accidental and malicious corruption. Tamper\-resistance is only guaranteed for
encrypted repositories against attackers without access to the keys. You cannot
use \fB\-\-verify\-data\fP with \fB\-\-repository\-only\fP\&.
.sp
The \fB\-\-find\-lost\-archives\fP option will also scan the whole repository, but
tells Borg to search for lost archive metadata. If Borg encounters any archive
metadata that does not match an archive directory entry (including
soft\-deleted archives), it means that an entry was lost.
Unless \fBborg compact\fP is called, these archives can be fully restored with
\fB\-\-repair\fP\&. Please note that \fB\-\-find\-lost\-archives\fP must read a lot of
data from the repository and is thus very time\-consuming. You cannot use
\fB\-\-find\-lost\-archives\fP with \fB\-\-repository\-only\fP\&.
.SSAboutrepairmode
.sp
The check command is a read\-only task by default. If any corruption is found,
Borg will report the issue and proceed with checking. To actually repair the
issues found, pass \fB\-\-repair\fP\&.
.sp
\fBNote:\fP
.INDENT0.0
.INDENT3.5
\fB\-\-repair\fP is a \fBPOTENTIALLY DANGEROUS FEATURE\fP and might lead to data
loss! This does not just include data that was previously lost anyway, but
might include more data for kinds of corruption it is not capable of
dealing with. \fBBE VERY CAREFUL!\fP
.UNINDENT
.UNINDENT
.sp
Pursuant to the previous warning it is also highly recommended to test the
reliability of the hardware running this software with stress testing software
such as memory testers. Unreliable hardware can also lead to data loss especially
when this command is run in repair mode.
reliability of the hardware running Borg with stress testing software. This
especially includes storage and memory testers. Unreliable hardware might lead
to additional data loss.
.sp
First, the underlying repository data files are checked:
It is highly recommended to create a backup of your repository before running
in repair mode (i.e. running it with \fB\-\-repair\fP).
.sp
Repair mode will attempt to fix any corruptions found. Fixing corruptions does
not mean recovering lost data: Borg cannot magically restore data lost due to
e.g. a hardware failure. Repairing a repository means sacrificing some data
for the sake of the repository as a whole and the remaining data. Hence it is,
by definition, a potentially lossy task.
.sp
In practice, repair mode hooks into both the repository and archive checks:
.INDENT0.0
.IP\(bu2
For all segments, the segment magic header is checked.
.IP\(bu2
For all objects stored in the segments, all metadata (e.g. CRC and size) and
all data is read. The read data is checked by size and CRC. Bit rot and other
types of accidental damage can be detected this way.
.IP\(bu2
In repair mode, if an integrity error is detected in a segment, try to recover
as many objects from the segment as possible.
.IP\(bu2
In repair mode, make sure that the index is consistent with the data stored in
the segments.
.IP\(bu2
If checking a remote repo via \fBssh:\fP, the repo check is executed on the server
without causing significant network traffic.
.IP\(bu2
The repository check can be skipped using the \fB\-\-archives\-only\fP option.
.IP\(bu2
A repository check can be time consuming. Partial checks are possible with the
\fB\-\-max\-duration\fP option.
.IP1.3
When checking the repository\(aqs consistency, repair mode removes corrupted
objects from the repository after it did a 2nd try to read them correctly.
.IP2.3
When checking the consistency and correctness of archives, repair mode might
remove whole archives from the manifest if their archive metadata chunk is
corrupt or lost. Borg will also report files that reference missing chunks.
.UNINDENT
.sp
Second, the consistency and correctness of the archive metadata is verified:
.INDENT0.0
.IP\(bu2
Is the repo manifest present? If not, it is rebuilt from archive metadata
chunks (this requires reading and decrypting of all metadata and data).
.IP\(bu2
Check if archive metadata chunk is present; if not, remove archive from manifest.
.IP\(bu2
For all files (items) in the archive, for all chunks referenced by these
files, check if chunk is present. In repair mode, if a chunk is not present,
replace it with a same\-size replacement chunk of zeroes. If a previously lost
chunk reappears (e.g. via a later backup), in repair mode the all\-zero replacement
chunk will be replaced by the correct chunk. This requires reading of archive and
file metadata, but not data.
.IP\(bu2
In repair mode, when all the archives were checked, orphaned chunks are deleted
from the repo. One cause of orphaned chunks are input file related errors (like
read errors) in the archive creation process.
.IP\(bu2
In verify\-data mode, a complete cryptographic verification of the archive data
integrity is performed. This conflicts with \fB\-\-repository\-only\fP as this mode
only makes sense if the archive checks are enabled. The full details of this mode
are documented below.
.IP\(bu2
If checking a remote repo via \fBssh:\fP, the archive check is executed on the
client machine because it requires decryption, and this is always done client\-side
as key access is needed.
.IP\(bu2
The archive checks can be time consuming; they can be skipped using the
\fB\-\-repository\-only\fP option.
.UNINDENT
.sp
The \fB\-\-max\-duration\fP option can be used to split a long\-running repository check
into multiple partial checks. After the given number of seconds the check is
interrupted. The next partial check will continue where the previous one stopped,
until the complete repository has been checked. Example: Assuming a full check took 7
hours, then running a daily check with \-\-max\-duration=3600 (1 hour) resulted in one
full check per week.
.sp
Attention: Partial checks can only do way less checking than a full check (only the
CRC32 checks on segment file entries are done), and cannot be combined with the
\fB\-\-repair\fP option. Partial checks may therefore be useful only with very large
repositories where a full check took too long. Doing a full repository check aborts a
partial check; the next partial check will restart from the beginning.
.sp
The \fB\-\-verify\-data\fP option will perform a full integrity verification (as opposed to
checking the CRC32 of the segment) of data, which means reading the data from the
repository, decrypting and decompressing it. This is a cryptographic verification,
which will detect (accidental) corruption. For encrypted repositories it is
tamper\-resistant as well, unless the attacker has access to the keys. It is also very
slow.
If \fB\-\-repair \-\-find\-lost\-archives\fP is given, previously lost entries will
be recreated in the archive directory. This is only possible before
\fBborg compact\fP would remove the archives\(aq data completely.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to check consistency of
.UNINDENT
.SSoptionalarguments
.SSoptions
.INDENT0.0
.TP
.B\-\-repository\-only
only perform repository checks
.TP
.B\-\-archives\-only
only perform archives checks
only perform archive checks
.TP
.B\-\-verify\-data
perform cryptographic archive data integrity verification (conflicts with \fB\-\-repository\-only\fP)
@ -144,34 +162,42 @@ perform cryptographic archive data integrity verification (conflicts with \fB\-\
.B\-\-repair
attempt to repair any inconsistencies found
.TP
.B\-\-save\-space
work slower, but using less space
.B\-\-find\-lost\-archives
attempt to find lost archives
.TP
.BI\-\-max\-duration\ SECONDS
do only a partial repo check for max. SECONDS seconds (Default: unlimited)
perform only a partial repository check for at most SECONDS seconds (default: unlimited)
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
read include/exclude patterns from PATTERNFILE, one per line
.TP
.B\-\-exclude\-caches
exclude directories that contain a CACHEDIR.TAG file (\fI\%http://www.bford.info/cachedir/spec.html\fP)
exclude directories that contain a CACHEDIR.TAG file (\%<https://\:www\:.bford\:.info/\:cachedir/\:spec\:.html>)
.TP
.BI\-\-exclude\-if\-present\ NAME
exclude directories that are tagged by containing a filesystem object with the given NAME
.TP
.B\-\-keep\-exclude\-tags
if tag objects are specified with \fB\-\-exclude\-if\-present\fP, don\(aqt omit the tag objects themselves from the backup archive
.TP
.B\-\-exclude\-nodump
exclude files flagged NODUMP
if tag objects are specified with \fB\-\-exclude\-if\-present\fP, do not omit the tag objects themselves from the backup archive
.UNINDENT
.SSFilesystemoptions
.INDENT0.0
.TP
.B\-x\fP,\fB\-\-one\-file\-system
stay in the same file system and do not store mount points of other file systems. This might behave different from your expectations, see the docs.
.TP
.B\-\-numeric\-owner
deprecated, use \fB\-\-numeric\-ids\fP instead
stay in the same file system and do not store mount points of other file systems \- this might behave different from your expectations, see the description below.
.TP
.B\-\-numeric\-ids
only store numeric user and group identifiers
.TP
.B\-\-noatime
do not store atime into archive
.TP
.B\-\-atime
do store atime into archive
.TP
@ -228,9 +242,6 @@ do not store ctime into archive
.B\-\-nobirthtime
do not store birthtime (creation date) into archive
.TP
.B\-\-nobsdflags
deprecated, use \fB\-\-noflags\fP instead
.TP
.B\-\-noflags
do not read and store flags (e.g. NODUMP, IMMUTABLE) into archive
.TP
@ -246,6 +257,9 @@ detect sparse holes in input (supported only by fixed chunker)
.BI\-\-files\-cache\ MODE
operate files cache in MODE. default: ctime,size,inode
.TP
.BI\-\-files\-changed\ MODE
specify how to detect if a file has changed during backup (ctime, mtime, disabled). default: ctime (on Windows: mtime, because ctime is file creation time there).
.TP
.B\-\-read\-special
open and read block and char device files as well as FIFOs as if they were regular files. Also follows symlinks pointing to these kinds of files.
.UNINDENT
@ -256,106 +270,125 @@ open and read block and char device files as well as FIFOs as if they were regul
add a comment text to the archive
.TP
.BI\-\-timestamp\ TIMESTAMP
manually specify the archive creation date/time (UTC, yyyy\-mm\-ddThh:mm:ss format). Alternatively, give a reference file/directory.
write checkpoint every SECONDS seconds (Default: 1800)
manually specify the archive creation date/time (yyyy\-mm\-ddThh:mm:ss[(+|\-)HH:MM] format, (+|\-)HH:MM is the UTC offset, default: local time zone). Alternatively, give a reference file/directory.
borg [common options] delete [options] [REPOSITORY_OR_ARCHIVE] [ARCHIVE...]
borg [common options] delete [options] [NAME]
.SHDESCRIPTION
.sp
This command deletes an archive from the repository or the complete repository.
This command soft\-deletes archives from the repository.
.sp
Important: When deleting archives, repository disk space is \fBnot\fP freed until
Important:
.INDENT0.0
.IP\(bu2
The delete command will only mark archives for deletion (\(dqsoft\-deletion\(dq),
repository disk space is \fBnot\fP freed until you run \fBborg compact\fP\&.
.IP\(bu2
You can use \fBborg undelete\fP to undelete archives, but only until
you run \fBborg compact\fP\&.
.sp
When you delete a complete repository, the security info and local cache for it
(if any) are also deleted. Alternatively, you can delete just the local cache
with the \fB\-\-cache\-only\fP option, or keep the security info with the
\fB\-\-keep\-security\-info\fP option.
.UNINDENT
.sp
When in doubt, use \fB\-\-dry\-run \-\-list\fP to see what would be deleted.
.sp
When using \fB\-\-stats\fP, you will get some statistics about how much data was
deleted \- the "Deleted data" deduplicated size there is most interesting as
that is how much your repository will shrink.
Please note that the "All archives" stats refer to the state after deletion.
.sp
You can delete multiple archives by specifying their common prefix, if they
have one, using the \fB\-\-prefix PREFIX\fP option. You can also specify a shell
pattern to match multiple archives using the \fB\-\-glob\-archives GLOB\fP option
(for more info on these patterns, see \fIborg_patterns\fP). Note that these
two options are mutually exclusive.
.sp
To avoid accidentally deleting archives, especially when using glob patterns,
it might be helpful to use the \fB\-\-dry\-run\fP to test out the command without
actually making any changes to the repository.
You can delete multiple archives by specifying a match pattern using
the \fB\-\-match\-archives PATTERN\fP option (for more information on these
patterns, see \fIborg_patterns\fP).
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to delete
.TP
.BARCHIVE
archives to delete
.BNAME
specify the archive name
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-n\fP,\fB\-\-dry\-run
do not change repository
do not change the repository
.TP
.B\-\-list
output verbose list of archives
.TP
.B\-s\fP,\fB\-\-stats
print statistics for the deleted archive
.TP
.B\-\-cache\-only
delete only the local cache for the given repository
.TP
.B\-\-force
force deletion of corrupted archives, use \fB\-\-force \-\-force\fP in case \fB\-\-force\fP does not work.
.TP
.B\-\-keep\-security\-info
keep the local security info when deleting a repository
.TP
.B\-\-save\-space
work slower, but using less space
output a verbose list of archives
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
# Use \-\-sort\-by with a comma\-separated list; sorts apply stably from last to first.
# Here: primary by net size change descending, tie\-breaker by path ascending
$ borg diff \-\-sort\-by=\(dq>size_diff,path\(dq archive1 archive2
+17 B \-5 B [\-rw\-r\-\-r\-\-\-> \-rwxr\-xr\-x] file1
removed 0 B file3
added 0 B file4
+135 B \-252 B file2
.EE
.UNINDENT
.UNINDENT
.SHNOTES
.SSTheFORMATspecifiersyntax
.sp
The \fB\-\-format\fP option uses Python\(aqs format string syntax \%<https://\:docs\:.python\:.org/\:3\:.10/\:library/\:string\:.html#\:formatstrings>\&.
.sp
Examples:
.INDENT0.0
.INDENT3.5
.sp
.EX
$ borg diff \-\-format \(aq{content:30} {path}{NL}\(aq ArchiveFoo ArchiveBar
modified: +4.1 kB \-1.0 kB file\-diff
\&...
# {VAR:<NUMBER} \- pad to NUMBER columns left\-aligned.
# {VAR:>NUMBER} \- pad to NUMBER columns right\-aligned.
$ borg diff \-\-format \(aq{content:>30} {path}{NL}\(aq ArchiveFoo ArchiveBar
modified: +4.1 kB \-1.0 kB file\-diff
\&...
.EE
.UNINDENT
.UNINDENT
.sp
The following keys are always available:
\- NEWLINE: OS dependent line separator
\- NL: alias of NEWLINE
\- NUL: NUL character for creating print0 / xargs \-0 like output
\- SPACE: space character
\- TAB: tab character
\- CR: carriage return character
\- LF: line feed character
.sp
Keys available only when showing differences between archives:
.INDENT0.0
.IP\(bu2
path: archived file path
.IP\(bu2
change: all available changes
.IP\(bu2
content: file content change
.IP\(bu2
mode: file mode change
.IP\(bu2
type: file type change
.IP\(bu2
owner: file owner (user/group) change
.IP\(bu2
group: file group change
.IP\(bu2
user: file user change
.IP\(bu2
link: file link change
.IP\(bu2
directory: file directory change
.IP\(bu2
blkdev: file block device change
.IP\(bu2
chrdev: file character device change
.IP\(bu2
fifo: file fifo change
.IP\(bu2
mtime: file modification time change
.IP\(bu2
ctime: file change time change
.IP\(bu2
isomtime: file modification time change (ISO 8601)
.IP\(bu2
isoctime: file creation time change (ISO 8601)
.UNINDENT
.SSWhatiscompared
.sp
For each matching item in both archives, Borg reports:
.INDENT0.0
.IP\(bu2
Content changes: total added/removed bytes within files. If chunker parameters are comparable,
Borg compares chunk IDs quickly; otherwise, it compares the content.
.IP\(bu2
Metadata changes: user, group, mode, and other metadata shown inline, like
\(dq[old_mode \-> new_mode]\(dq for mode changes. Use \fB\-\-content\-only\fP to suppress metadata changes.
.IP\(bu2
Added/removed items: printed as \(dqadded SIZE path\(dq or \(dqremoved SIZE path\(dq.
.UNINDENT
.SSOutputformats
.sp
The default (text) output shows one line per changed path, e.g.:
.INDENT0.0
.INDENT3.5
.sp
.EX
+135 B \-252 B [ \-rw\-r\-\-r\-\-\-> \-rwxr\-xr\-x ] path/to/file
.EE
.UNINDENT
.UNINDENT
.sp
JSON Lines output (\fB\-\-json\-lines\fP) prints one JSON object per changed path, e.g.:
write checkpoint every SECONDS seconds (Default: 1800)
manually specify the archive creation date/time (yyyy\-mm\-ddThh:mm:ss[(+|\-)HH:MM] format, (+|\-)HH:MM is the UTC offset, default: local time zone). Alternatively, give a reference file/directory.
borg-info \- Show archive details such as disk space used
.SHSYNOPSIS
.sp
borg [common options] info [options] [REPOSITORY_OR_ARCHIVE]
borg [common options] info [options] [NAME]
.SHDESCRIPTION
.sp
This command displays detailed information about the specified archive or repository.
This command displays detailed information about the specified archive.
.sp
Please note that the deduplicated sizes of the individual archives do not add
up to the deduplicated size of the repository ("all archives"), because the two
are meaning different things:
up to the deduplicated size of the repository (\(dqall archives\(dq), because the two
mean different things:
.sp
This archive / deduplicated size = amount of data stored ONLY for this archive
= unique chunks of this archive.
All archives / deduplicated size = amount of data stored in the repo
All archives / deduplicated size = amount of data stored in the repository
= all chunks in the repository.
.sp
Borg archives can only contain a limited amount of file metadata.
The size of an archive relative to this limit depends on a number of factors,
mainly the number of files, the lengths of paths and other metadata stored for files.
This is shown as \fIutilization of maximum supported archive size\fP\&.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to display information about
.BNAME
specify the archive name
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-\-json
@ -68,87 +64,55 @@ format output as JSON
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
This command initializes an empty repository. A repository is a filesystem
directory containing the deduplicated data from zero or more archives.
.SSEncryptionmodeTLDR
.sp
The encryption mode can only be configured when creating a new repository \-
you can neither configure it on a per\-archive basis nor change the
encryption mode of an existing repository.
.sp
Use \fBrepokey\fP:
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
borg init \-\-encryption repokey /path/to/repo
.ftP
.fi
.UNINDENT
.UNINDENT
.sp
Or \fBrepokey\-blake2\fP depending on which is faster on your client machines (see below):
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
borg init \-\-encryption repokey\-blake2 /path/to/repo
.ftP
.fi
.UNINDENT
.UNINDENT
.sp
Borg will:
.INDENT0.0
.IP1.3
Ask you to come up with a passphrase.
.IP2.3
Create a borg key (which contains 3 random secrets. See \fIkey_files\fP).
.IP3.3
Encrypt the key with your passphrase.
.IP4.3
Store the encrypted borg key inside the repository directory (in the repo config).
This is why it is essential to use a secure passphrase.
.IP5.3
Encrypt and sign your backups to prevent anyone from reading or forging them unless they
have the key and know the passphrase. Make sure to keep a backup of
your key \fBoutside\fP the repository \- do not lock yourself out by
"leaving your keys inside your car" (see \fIborg_key_export\fP).
For remote backups the encryption is done locally \- the remote machine
never sees your passphrase, your unencrypted key or your unencrypted files.
Chunking and id generation are also based on your key to improve
your privacy.
.IP6.3
Use the key when extracting files to decrypt them and to verify that the contents of
the backups have not been accidentally or maliciously altered.
.UNINDENT
.SSPickingapassphrase
.sp
Make sure you use a good passphrase. Not too short, not too simple. The real
encryption / decryption key is encrypted with / locked by your passphrase.
If an attacker gets your key, he can\(aqt unlock and use it without knowing the
passphrase.
.sp
Be careful with special or non\-ascii characters in your passphrase:
.INDENT0.0
.IP\(bu2
Borg processes the passphrase as unicode (and encodes it as utf\-8),
so it does not have problems dealing with even the strangest characters.
.IP\(bu2
BUT: that does not necessarily apply to your OS / VM / keyboard configuration.
.UNINDENT
.sp
So better use a long passphrase made from simple ascii chars than one that
includes non\-ascii stuff or characters that are hard/impossible to enter on
a different keyboard layout.
.sp
You can change your passphrase for existing repos at any time, it won\(aqt affect
the encryption/decryption key or other secrets.
.SSMoreencryptionmodes
.sp
Only use \fB\-\-encryption none\fP if you are OK with anyone who has access to
your repository being able to read your backups and tamper with their
contents without you noticing.
.sp
If you want "passphrase and having\-the\-key" security, use \fB\-\-encryption keyfile\fP\&.
The key will be stored in your home directory (in \fB~/.config/borg/keys\fP).
.sp
If you do \fBnot\fP want to encrypt the contents of your backups, but still
want to detect malicious tampering use \fB\-\-encryption authenticated\fP\&.
.sp
If \fBBLAKE2b\fP is faster than \fBSHA\-256\fP on your hardware, use \fB\-\-encryption authenticated\-blake2\fP,
\fB\-\-encryption repokey\-blake2\fP or \fB\-\-encryption keyfile\-blake2\fP\&. Note: for remote backups
the hashing is done on your local machine.
.\" nanorst: inline-fill
.
.TS
center;
|l|l|l|l|.
_
T{
Hash/MAC
T} T{
Not encrypted
no auth
T} T{
Not encrypted,
but authenticated
T} T{
Encrypted (AEAD w/ AES)
and authenticated
T}
_
T{
SHA\-256
T} T{
none
T} T{
\fIauthenticated\fP
T} T{
repokey
keyfile
T}
_
T{
BLAKE2b
T} T{
n/a
T} T{
\fIauthenticated\-blake2\fP
T} T{
\fIrepokey\-blake2\fP
\fIkeyfile\-blake2\fP
T}
_
.TE
.\" nanorst: inline-replace
.
.sp
Modes \fImarked like this\fP in the above table are new in Borg 1.1 and are not
backwards\-compatible with Borg 1.0.x.
.sp
On modern Intel/AMD CPUs (except very cheap ones), AES is usually
hardware\-accelerated.
BLAKE2b is faster than SHA256 on Intel/AMD 64\-bit CPUs
(except AMD Ryzen and future CPUs with SHA extensions),
which makes \fIauthenticated\-blake2\fP faster than \fInone\fP and \fIauthenticated\fP\&.
.sp
On modern ARM CPUs, NEON provides hardware acceleration for SHA256 making it faster
than BLAKE2b\-256 there. NEON accelerates AES as well.
.sp
Hardware acceleration is always used automatically when available.
.sp
\fIrepokey\fP and \fIkeyfile\fP use AES\-CTR\-256 for encryption and HMAC\-SHA256 for
authentication in an encrypt\-then\-MAC (EtM) construction. The chunk ID hash
is HMAC\-SHA256 as well (with a separate key).
These modes are compatible with Borg 1.0.x.
.sp
\fIrepokey\-blake2\fP and \fIkeyfile\-blake2\fP are also authenticated encryption modes,
but use BLAKE2b\-256 instead of HMAC\-SHA256 for authentication. The chunk ID
hash is a keyed BLAKE2b\-256 hash.
These modes are new and \fInot\fP compatible with Borg 1.0.x.
.sp
\fIauthenticated\fP mode uses no encryption, but authenticates repository contents
through the same HMAC\-SHA256 hash as the \fIrepokey\fP and \fIkeyfile\fP modes (it uses it
as the chunk ID hash). The key is stored like \fIrepokey\fP\&.
This mode is new and \fInot\fP compatible with Borg 1.0.x.
.sp
\fIauthenticated\-blake2\fP is like \fIauthenticated\fP, but uses the keyed BLAKE2b\-256 hash
from the other blake2 modes.
This mode is new and \fInot\fP compatible with Borg 1.0.x.
.sp
\fInone\fP mode uses no encryption and no authentication. It uses SHA256 as chunk
ID hash. This mode is not recommended, you should rather consider using an authenticated
or authenticated/encrypted mode. This mode has possible denial\-of\-service issues
when running \fBborg create\fP on contents controlled by an attacker.
Use it only for new repositories where no encryption is wanted \fBand\fP when compatibility
with 1.0.x is important. If compatibility with 1.0.x is not important, use
\fIauthenticated\-blake2\fP or \fIauthenticated\fP instead.
This mode is compatible with Borg 1.0.x.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY
repository to create
.UNINDENT
.SSoptionalarguments
.INDENT0.0
.TP
.BI\-e\ MODE\fR,\fB\ \-\-encryption\ MODE
select encryption key mode \fB(required)\fP
.TP
.B\-\-append\-only
create an append\-only mode repository. Note that this only affects the low level structure of the repository, and running \fIdelete\fP or \fIprune\fP will still be allowed. See \fIappend_only_mode\fP in Additional Notes for more details.
.TP
.BI\-\-storage\-quota\ QUOTA
Set storage quota of the new repository (e.g. 5G, 1.5T). Default: no quota.
.TP
.B\-\-make\-parent\-dirs
create the parent directories of the repository directory, if they are missing.
.UNINDENT
.SHEXAMPLES
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
# Local repository, repokey encryption, BLAKE2b (often faster, since Borg 1.1)
$ borg init \-\-encryption=repokey\-blake2 /path/to/repo
# Local repository (no encryption)
$ borg init \-\-encryption=none /path/to/repo
# Remote repository (accesses a remote borg via ssh)
# repokey: stores the (encrypted) key into <REPO_DIR>/config
$ borg init \-\-encryption=repokey\-blake2 user@hostname:backup
# Remote repository (accesses a remote borg via ssh)
# keyfile: stores the (encrypted) key into ~/.config/borg/keys/
$ borg init \-\-encryption=keyfile user@hostname:backup
borg [common options] list [options] [REPOSITORY_OR_ARCHIVE] [PATH...]
borg [common options] list [options] NAME [PATH...]
.SHDESCRIPTION
.sp
This command lists the contents of a repository or an archive.
This command lists the contents of an archive.
.sp
For more help on include/exclude patterns, see the \fIborg_patterns\fP command output.
For more help on include/exclude patterns, see the output of \fIborg_patterns\fP\&.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to list contents of
.BNAME
specify the archive name
.TP
.BPATH
paths to list; patterns are supported
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-\-consider\-checkpoints
Show checkpoint archives in the repository contents list (default: hidden).
.TP
.B\-\-short
only print file/directory names, nothing else
.TP
.BI\-\-format\ FORMAT
specify format for file or archive listing (default for files: "{mode} {user:6} {group:6} {size:8} {mtime} {path}{extra}{NL}"; for archives: "{archive:<36} {time} [{id}]{NL}")
.TP
.B\-\-json
Only valid for listing repository contents. Format output as JSON. The form of \fB\-\-format\fP is ignored, but keys used in it are added to the JSON output. Some keys are always present. Note: JSON can only represent text. A "barchive" key is therefore not available.
specify format for file listing (default: \(dq{mode} {user:6} {group:6} {size:8} {mtime} {path}{extra}{NL}\(dq)
.TP
.B\-\-json\-lines
Only valid for listing archive contents. Format output as JSON Lines. The form of \fB\-\-format\fP is ignored, but keys used in it are added to the JSON output. Some keys are always present. Note: JSON can only represent text. A "bpath" key is therefore not available.
Format output as JSON Lines. The form of \fB\-\-format\fP is ignored, but keys used in it are added to the JSON output. Some keys are always present. Note: JSON can only represent text.
.TP
.BI\-\-depth\ N
only list files up to the specified directory depth
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
.TP
.BI\-\-sort\-by\ KEYS
Comma\-separated list of sorting keys; valid keys are: timestamp, name, id; default is: timestamp
.TP
.BI\-\-first\ N
consider first N archives after other filters were applied
.TP
.BI\-\-last\ N
consider last N archives after other filters were applied
.UNINDENT
.SSExclusionoptions
.SSInclude/Excludeoptions
.INDENT0.0
.TP
.BI\-e\ PATTERN\fR,\fB\ \-\-exclude\ PATTERN
@ -105,16 +85,8 @@ read include/exclude patterns from PATTERNFILE, one per line
$ borg list /path/to/repo::archiveA \-\-format="{mode} {user:6} {group:6} {size:8d} {isomtime} {path}{extra}{NEWLINE}"
$ borg list archiveA \-\-format=\(dq{mode} {user:6} {group:6} {size:8d} {isomtime} {path}{extra}{NEWLINE}\(dq
drwxrwxr\-x user user 0 Sun, 2015\-02\-01 11:00:00 .
drwxrwxr\-x user user 0 Sun, 2015\-02\-01 11:00:00 code
drwxrwxr\-x user user 0 Sun, 2015\-02\-01 11:00:00 code/myproject
@ -137,147 +109,91 @@ drwxrwxr\-x user user 0 Sun, 2015\-02\-01 11:00:00 code/myproject
\-rw\-rw\-r\-\- user user 1416192 Sun, 2015\-02\-01 11:00:00 code/myproject/file.text
\&...
$ borg list /path/to/repo/::archiveA \-\-pattern \(aqre:\e.ext$\(aq
$ borg list archiveA \-\-pattern \(aq+ re:\e.ext$\(aq\-\-pattern \(aq\- re:^.*$\(aq
\-rw\-rw\-r\-\- user user 1416192 Sun, 2015\-02\-01 11:00:00 code/myproject/file.ext
\&...
$ borg list /path/to/repo/::archiveA \-\-pattern \(aqre:.ext$\(aq
$ borg list archiveA \-\-pattern \(aq+ re:.ext$\(aq\-\-pattern \(aq\- re:^.*$\(aq
\-rw\-rw\-r\-\- user user 1416192 Sun, 2015\-02\-01 11:00:00 code/myproject/file.ext
\-rw\-rw\-r\-\- user user 1416192 Sun, 2015\-02\-01 11:00:00 code/myproject/file.text
\&...
.ftP
.fi
.EE
.UNINDENT
.UNINDENT
.SHNOTES
.SSTheFORMATspecifiersyntax
.sp
The \fB\-\-format\fP option uses python\(aqs \fI\%format string syntax\fP\&.
The \fB\-\-format\fP option uses Python\(aqs format string syntax \%<https://\:docs\:.python\:.org/\:3\:.10/\:library/\:string\:.html#\:formatstrings>\&.
.sp
Examples:
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
$ borg list \-\-format \(aq{archive}{NL}\(aq /path/to/repo
ArchiveFoo
ArchiveBar
\&...
# {VAR:NUMBER} \- pad to NUMBER columns.
# Strings are left\-aligned, numbers are right\-aligned.
# Note: time columns except \(ga\(gaisomtime\(ga\(ga, \(ga\(gaisoctime\(ga\(ga and \(ga\(gaisoatime\(ga\(ga cannot be padded.
$ borg list \-\-format \(aq{archive:36} {time} [{id}]{NL}\(aq /path/to/repo