Remove the handwritten bash and zsh shell completion scripts now that
auto-generated completions via borg completion bash/zsh (powered by
shtab, #9172) are tested and working. Fish completions are kept as
shtab does not yet support fish.
Replace string-matching tests with focused behavior tests: script size
sanity, shell syntax validation (bash -n / zsh -n), and tests that
invoke the custom preamble functions in bash (sortby key dedup,
filescachemode mutual exclusivity, archive name and aid: prefix
completion against a real repository).
Add --json-lines flag to 'borg benchmark crud' that outputs
each measurement as a JSON object (one per line) for easy
machine parsing. Also improve test coverage to validate both
human-readable and JSON-lines output formats.
Add --json flag to 'borg benchmark cpu' that outputs all benchmark
results as a single JSON object for easy machine parsing. Size values
use integers (bytes) in JSON and format_file_size() for human-readable
text output. Also add tests for both plain-text and JSON output formats.
- 2.0.x: mark as beta (not yet stable release)
- 1.2.x: no new releases, critical fixes may still be backported
- Keep 1.4.x as supported, 1.1.x and below as unsupported
If multiple environment variables for the same passphrase context are
provided (e.g., both BORG_PASSPHRASE and BORG_PASSCOMMAND), Borg now
terminates with an error instead of silently choosing one.
This prevents the issue where an old BORG_PASSPHRASE in the environment
could override a newly intended BORG_PASSCOMMAND or BORG_PASSPHRASE_FD.
When running as a Pyinstaller-made binary, sys.executable points to the
borg binary itself. Invoking it with "-m borg" resulted in an incorrect
command line (e.g., "borg -m borg ..."), which confused the argument
parser in the subprocess.
This change checks sys.frozen to determine the correct invocation:
- If frozen: [sys.executable, ...args]
- If not frozen: [sys.executable, "-m", "borg", ...args]
The check_python() function verified that the Python runtime supported
'follow_symlinks' for os.stat, os.utime, and os.chown. This check is no
longer necessary because:
1. Borg now requires Python >= 3.10.
2. On POSIX systems (Linux, macOS, *BSD, Haiku, OmniOS), support for these
operations relies on the *at syscalls (fstatat, etc.), which have been
implemented in standard libc for well over a decade (e.g., FreeBSD 8.0+,
NetBSD 6.0+, Solaris 11+).
3. On Windows (MSYS2/MinGW), Python has supported follow_symlinks for
os.stat since Python 3.2. The removed check specifically inspected only
os.stat on Windows, avoiding the problematic os.utime/os.chown checks.
Any platform capable of running Python 3.10 will inherently support these
standard file operations.
ci.yml already has timeout-minutes on every job, but these three
workflows had no timeout configured. Without an explicit timeout,
GitHub Actions defaults to 6 hours, wasting CI minutes if a job
gets stuck.
Added timeouts consistent with ci.yml:
- codeql-analysis.yml: 20 min (builds from source + analysis)
- backport.yml: 5 min (simple checkout + PR creation)
- black.yaml: 5 min (matches ci.yml ruff lint job)
Fixes#9298
if an already existing fs directory has the correct (as archived) mtime,
we have already extracted it in a previous borg extract run and we do not
need and should not call restore_attrs for it again.
if the directory exists, but does not have the correct mtime, restore_attrs
will be called and its attributes will be extracted (and mtime set to
correct value).
Enable JUnit XML generation for `native_tests` and `windows_tests` to allow Codecov to process test analytics.
Upload the generated `test-results.xml` using `codecov/codecov-action`.
The upload step uses `if: !cancelled()` to ensure results are uploaded even if tests fail (to analyze failures), but skipped if the workflow is explicitly cancelled.
Originally brought up in PR #8752 by @katia-sentry, but missed the !cancelled() check.
Also: upgrade to codecov-action@v5.
This allows users to compare file content efficiently without reading the
full file data, by exposing a hash of the chunk IDs and the relevant
conditions for valid comparisons, like chunker params, chunker seed/key,
id key, key type, etc.
This is based on PR #5167 by @hrehfeld, code + discussion, with some changes:
- the conditions hash now includes more relevant input params
- returning a single value that is composed of 2 parts
- tests (including new buzhash64)
Example output (different files in same archive):
1e88bfb02d0a5320-a539587200c33b857f9827d01fcb7dabacf30501c83929e7308668d43f4a6302 file1
1e88bfb02d0a5320-9ed78a4c14d0506d9ae75d914cca90db64655ddea22647dd1c89f19e2fc080ae file2
The fingerprint has 2 parts:
First part: same hash, indicates same chunking / chunk id generation params,
meaning that the second part is valid to be compared.
Second part: different hash, because file content is different.
same hash here would mean same content.
A pre-existing directory might be a btrfs subvolume that was created by
the user ahead of time when restoring several nested subvolumes from a
single archive.
If the archive item to be extracted is a directory and there is already
a directory at the destination path, do not remove (and recreate) it,
but just use it.
That way, btrfs subvolumes (which look like directories) are not deleted.
Fix originally contributed by @intelfx in #7866, but needed more work,
so I thought more about the implications and added a test.
Note:
In the past, we first removed (empty) directories, then created a fresh
one, then called restore_attrs for that. That produced correct metadata,
but only for the case of an EMPTY exisiting directory. If the existing
directory was not empty, the simply os.rmdir we tried did not work
anyway and did not remove the existing directory.
Usually we extract to an empty base directory, thus encountering this
edge case is mostly limited to continuing a previous extraction.
In that case, calling restore_attrs again on a directory that already has
existing attrs should be harmless, because they are identical.
This implementation should be good enough for our usecase (paths) and has no external dependencies.
There is also a wcwidth library which might be even better, but would add another dependency.
This commit implements a comprehensive approach to Windows path compatibility
by standardizing on forward slashes (/) for all internal path representations
while maintaining cross-platform archive compatibility.
Core Strategy:
- All internal paths now use forward slashes as separators on all platforms
- Boundary normalization: backslashes converted to forward slashes at entry
points on Windows (filesystem paths only, not user patterns)
- Literal backslashes from POSIX archives replaced with % on Windows extraction
Key Changes:
Path Handling (helpers/fs.py):
- Added slashify(): converts backslashes to forward slashes on Windows
- Added percentify(): replaces backslashes with % for POSIX-to-Windows extraction
- Updated make_path_safe() to check for Windows-style .. patterns
- Changed get_strip_prefix() to use posixpath.normpath instead of os.path.normpath
- Updated remove_dotdot_prefixes() to use forward slashes consistently
Pattern Matching (patterns.py):
- Replaced os.path with posixpath throughout for consistent separator handling
- Updated PathFullPattern, PathPrefixPattern, FnmatchPattern, ShellPattern
- All pattern matching now uses / as separator regardless of platform
- Removed platform-specific os.sep usage
Archive Operations (archive.py, item.pyx):
- Applied slashify() to paths during archive creation on Windows
- Added percentify/slashify encoding/decoding for symlink targets
- Ensures archived paths always use forward slashes
Command Line (archiver/create_cmd.py, extract_cmd.py):
- Replaced os.path.join/normpath with posixpath equivalents
- Added slashify() for stdin-provided paths on Windows
- Updated strip_components to use / separator
- Changed PathSpec to FilesystemPathSpec for proper path handling
Repository (repository.py, legacyrepository.py):
- Replaced custom _local_abspath_to_file_url() with Path.as_uri()
Documentation (archiver/help_cmd.py):
- Clarified that all archived paths use forward slashes
- Added note about Windows absolute paths in archives (e.g., C/Windows/System32)
- Documented backslash-to-percent replacement for POSIX archives on Windows
Impact:
- Windows users can now create and extract archives with consistent path handling
- Cross-platform archives remain compatible
- Pattern matching works identically on all platforms
Seen this on the macOS arm64 runner:
ImportError: dlopen(/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so, 0x0002): tried: '/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so' (no such file), '/Users/runner/work/borg/borg/.tox/py311-none/lib/python3.11/site-packages/_argon2_cffi_bindings/_ffi.abi3.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))
Consolidate key backup documentation into `borg key export` and reference
it from Quickstart and FAQ to avoid duplication and inconsistency.
Clarify that while `repokey` or `authenticated` mode stores the key in the
repo, a separate backup is still recommended to protect against repository
corruption or data loss.
The single-file borg.exe needs unpacking each time it is invoked.
borg-dir/borg.exe is alread unpacked.
Also, macOS is slow when a "new" binary is first invoked, so
this should help there even more.
fuse2 was a bit misleading. it meant "our 2nd fuse implementation",
but could be misunderstood to refer to fuse v2.
hlfuse.py now means highlevel fuse, as opposed to the lowlevel fuse in fuse.py.
Updated mount_cmds_test.py to work with both llfuse/pyfuse3 and mfusepy
by checking for either implementation in skip conditions.
mfusepy: 2 test fails due to hardlink implementation differences
fixes#9182
- install OS fuse support packages as indicated by the tox env.
on the macOS runners, we do not have any fuse support.
on the linux runners, we may have fuse2 or fuse3.
on FreeBSD, we have fuse2.
- install fuse python library for binary build
- first build/upload binaries, then run tests (including binary tests).
early uploading makes inspection of a malfunctioning binary possible.
- for now, use llfuse, as there is an issue with pyinstaller and pyfuse3.
Also:
- remove || true - this just hides errors, not what we want.
emit only a warning, but let compaction complete.
after that, borg check --repair can fix the hints successfully.
likely this code won't be used in master branch as we only read from
legacy repos, but I ported this fix from 1.4-maint nevertheless.
This is the result of a longer discussion with Antigravity AI and me:
Detailed Explanation: Why Converting AssertionError to Warning is Correct
=========================================================================
PROBLEM OVERVIEW
----------------
The assertion `assert segments[segment] == 0` in compact_segments() was causing
borg compact to crash when segment reference counts in the hints file didn't
match the actual repository state. This typically occurred after index corruption
or repository recovery scenarios.
ROOT CAUSE ANALYSIS
-------------------
The crash happens due to a fundamental mismatch between two data structures:
1. self.segments (loaded from hints file)
- Contains reference counts for each segment
- Persisted to disk in the hints file
- Represents the "last known state"
2. self.index (loaded from index file)
- Contains mappings of object IDs to (segment, offset) pairs
- Can be corrupted or lost
- When corrupted, triggers auto-recovery
The Problem Scenario:
1. Repository has valid data with consistent hints.N and index.N
2. Index file gets corrupted (crash, disk error, etc.)
3. Borg detects corruption and auto-recovers:
- Loads hints.N (with old reference counts)
- Rebuilds index by replaying segments
- Commits the rebuilt index
4. State is now inconsistent IF segments were deleted/lost:
- self.segments[X] = 10 (from old hints, assumes segment X exists)
- Segment X was actually deleted/lost
- self.index has 0 entries for segment X (rebuilt from remaining segments)
5. During compact_segments():
- Tries to iterate objects in segment X
- Segment X doesn't exist (was deleted/lost)
- OR: segment X exists but objects aren't in index (superseded)
- segments[X] is never decremented
- segments[X] remains 10 instead of becoming 0
- Assertion fails!
WHY THE FIX IS CORRECT
----------------------
1. Hints are Advisory, Not Authoritative
The hints file is an optimization to avoid scanning all segments. It's
explicitly designed to be rebuildable from scratch by scanning segments.
Therefore, incorrect hints should not cause a fatal error.
2. Self-Healing Behavior
By converting the assertion to a warning and allowing compaction to proceed:
- Compaction completes successfully
- New hints are written with correct reference counts
- Repository is automatically healed
- No manual intervention required
3. Data Safety is Preserved
The fix does NOT compromise data integrity because:
- Compaction first copies all live data from segments to new segments
- Only after all live data is safely copied are segments marked for deletion
- The index determines what's "live" (authoritative source of truth)
- Segments are deleted only when they contain no live data (per index)
- The refcount warning indicates stale hints, not actual data loss risk
- After compaction, new hints are written with correct counts
4. Consistent with Design Philosophy
Borg already handles many corruption scenarios gracefully:
- Missing hints → regenerated from segments
- Corrupted index → rebuilt from segments
- Missing segments → detected and handled
This fix extends that philosophy to hint/index mismatches.
5. Alternative Solutions are Worse
Other approaches considered:
a) Crash and require manual intervention
- Current behavior, user-hostile
- Requires expert knowledge to fix
b) Automatically run check --repair
- Too aggressive, may hide real problems
- User should decide when to repair
c) Refuse to compact
- Leaves repository in degraded state
- Prevents normal operations
VERIFICATION
------------
The fix has been verified with test cases that reproduce both scenarios:
1. test_missing_segment_in_hints
- Simulates missing segment files
- Verifies compact succeeds and updates hints correctly
2. test_index_corruption_with_old_hints
- Simulates the root cause: corrupted index with old hints
- Verifies compact succeeds despite reference count mismatch
3. test_subtly_corrupted_hints_without_integrity
- Existing test updated to expect warning instead of crash
- Verifies repository remains consistent after compaction
OPERATIONAL IMPACT
------------------
After this fix:
1. Users experiencing this crash can now run `borg compact` successfully
2. The warning message alerts them to the inconsistency
3. They can optionally run `borg check --repair` for peace of mind
4. Repository continues to function normally
The warning message provides enough information for debugging while not
blocking normal operations.
CONCLUSION
----------
Converting the assertion to a warning is the correct fix because:
- It aligns with Borg's design philosophy of graceful degradation
- It enables self-healing behavior
- It preserves data safety
- It improves user experience
- It's consistent with how other corruption scenarios are handled
The assertion was overly strict for a data structure (hints) that is
explicitly designed to be advisory and rebuildable.
we can't monkeypatch stuff in Cython/C code, so we
go over python module attribute lookup.
that way, we can more easily test some functions that
internally do id<->name lookups.
I could not find the root cause of this issue, but it is likely a minor
problem with ctime and doesn't affect borg usage much.
So I rather like to have CI on netbsd not failing because of this.
The test fails on these platforms.
I could not find the root cause of this issue, but it is likely a minor
problem with ctime and doesn't affect borg usage much.
So I rather like to have CI on freebsd/netbsd not failing because of this.
Also: add is_netbsd and is_openbsd to platformflags.
so that pytest options are centrally managed in tox configuration.
let tox build venv and install requirements.
tox does this anyway, so we save some time if we
do not need the venv for other purposes also
(like e.g. building binaries).
Also:
- default XDISTN to "auto". XDISTN is still used by Vagrantfile.
- some other optimisations, like less package manager calls.
- use XDISTN=1 for haiku
- fix freebsd binary build condition
use borg diff --sort-by=spec1,spec2,spec2 for enhanced sorting.
remove legacy --sort behaviour (sort by path), this was deprecated
since 1.4.2.
Co-authored-by: Daniel Rudolf <github.com@daniel-rudolf.de>
This is a port of #9005 to master branch.
- grant id-token and attestations permissions to posix_tests job
- add actions/attest-build-provenance@v1 step for built artifacts
This publishes SLSA-style provenance for our tag builds (only when binaries
are produced) so users can verify the origin of downloaded borg binaries.
bad:
- no *BSD testing and FreeBSD binary building on gh
- binaries not signed by me, because they are built on gh
good:
- for linux intel/amd64 and arm64, built on ubuntu
- for macOS intel and arm64, built on a relatively recent macOS
- I can get rid of that ancient macOS VM I used for building.
- the source code distribution (sdist) is still made locally on
my machine and thus signed with my signature (*.asc).
preserve UF_COMPRESSED and SF_DATALESS when restoring flags,
get-modify-set in macOS set_flags, keeping system-managed read-only flags.
(cherry picked from commit 83571aa00d)
This flag needs to be set BEFORE writing to the file.
But "borg extract" sets the flags last (to support IMMUTABLE),
thus the compression flag would not work as expected.
(cherry picked from commit 56dda84162)
Linux platform only.
(cherry picked from commit 9214197a2c)
set_flags: if getting the flags fails, better give up than
corrupting them.
Thanks to Earnestly for the feedback on IRC.
(cherry picked from commit 9c600a9571)
when borg mount is used without -f/--foreground (so that the FUSE
borg process was started daemonized in the background), it did not
display the rc of the main process, even when --show-rc was used.
now it does display the rc of the main process.
note that this is rather a consistency fix than being super useful,
because the main "action" happens in the background daemon process,
not in the main process.
Previously when running borg in a systemd service (and similar when piping to
a file and co.), these problems occurred:
- The carriage return both made it so that journald interpreted the output as
binary, therefore not printing the text, while also not buffering
correctly, so that log output was only available every once in a while
in the form [40k blob data]. This can partially be worked around by
using `journalctl -a` to view the logs, which at least prints the text,
though only sporadically.
- The path was getting truncated to a short length, since the default
get_terminal_size returns a column width of 80, which isn't relevant
when printing to e.g. journald.
This commit fixes this by introducing a new code path for when stream is
not a tty, which always prints the full paths and ends lines with a linefeed.
This is based on unfinished PR #8939 by @infinisil, thanks for your suggestion!
Forward port of PR #9055 to master.
The VM was used for local macOS testing and
also for building a macOS intel fat binary.
We also do macOS CI testing on GitHub and I
recently added binary building on GitHub for
Apple Silicon and Intel.
The macOS 10 VM was very outdated, super slow
and a pain to use. I didn't succeed in building
a recent macOS vagrant VM, so we'll just use
GitHub from now on...
The original markup included a paragraph element wrapping a block-level pre element, which is invalid per HTML’s content model (a p can only contain phrasing content; pre is flow content).
The fix separated text and pre blocks into valid sibling elements, ensuring no pre is nested inside a p.
we only read from borg 1.x legacy repos, we must not
try to "fix" them (users can use borg1 check --repair).
had to remove some tests that relied on this "feature".
2 fixes:
- add code to update/verify the HashHeader integrity hash. this code was
missing and led to FileIntegrityError on the borg 1.x repo index.
- when reading a non-compact borg 1.x hash table from disk (like the borg
repo index), only add the "used" buckets to the in-memory hashtable,
but not the unused/tombstone buckets.
The corruption described in #9022 was happening like this:
- borg failed to read the repo index, because the integrity check failed
- due to open_index(..., auto_recover=True), it tried to "fix" it by
writing an empty hash table to disk. borg 1.x usually then rebuilt the
index, but somehow this wasn't happening for the user in #9022.
Borg2 documentation mentions the support for the s3 backend however,
borg was missing the parsing bits for an s3 repo.
This updates the Location parser to parse the s3 url using the same
logic as borgstore.
Note: borgstore should be installed with the s3 dependencies in order
for the s3 backend to work.
Signed-off-by: Mike Mason <github@mikemrm.com>
Implemented handling of POSIX access and default ACLs in tar files.
New keys, `SCHILY.acl.access` and `SCHILY.acl.default`, are used
to store these ACLs in the tar PAX headers.
control how borg detects whether a file has changed while it was backed up, valid modes are ctime, mtime or disabled.
ctime is the safest mode and the default.
mtime can be useful if ctime does not work correctly for some reason
(e.g. OneDrive files change their ctime without the user changing the file).
disabled (= disabling change detection) is not recommended as it could lead to
inconsistent backups. Only use if you know what you are doing.
the stuff in Python stdlib "random.Random" is not cryptographically strong
and the stuff in Python stdlib "secrets" can't be seeded and does not
offer shuffle.
the previous approach had cryptographic strength randomness, but a precise
50:50 0/1 bit distribution per bit position in the table was not assured.
now this is always the case due to the way how the table is constructed.
That way we can feed lots of entropy into the table creation.
The bh64_key is derived from the id_key (NOT the crypt_key), thus
it will create the same key for related repositories (even if they
use different encryption/authentication keys). Due to that, it will
also create the same buzhash64 table, will cut chunks at the same
points and deduplication will work amongst the related repositories.
Only compare the main version number, e.g. 1.1.1 (first 3 elements
of the version tuple).
Without this change, it would not accept 1.1.1rc1 because that is
not "<= (1, 1, 1)" in that simplistic version comparison.
Separated `chunker_test` into two dedicated test modules: `fixed_test` (for `ChunkerFixed`) and `buzhash_test` (for `Chunker`). Updated imports and adjusted references accordingly.
Moved the `ChunkerFixed` implementation from `chunker` to a new `fixed` module for better modularity. Updated imports and type hints.
Removed now empty chunkers.chunker module.
Moved the `ChunkerFailing` implementation from `chunker` to a new `failing` module for better modularity. Updated imports and type hints. Adjusted related definitions in `chunker.pyi` accordingly.
Moved `buzhash` implementation from `chunker` to a new `buzhash` module for better separation of concerns. Updated imports, adjusted `setup.py` and build configuration accordingly. Removed deprecated `Chunker` definitions from `chunker.pyi`.
Relocated `get_chunker` function from `chunker` module to `chunkers.__init__.py` for improved organization. Updated `Chunker` class signature to include a `sparse` parameter with a default value. Adjusted imports and type hints accordingly.
Extracted the `reader` logic from `chunker` into a dedicated `reader` module to improve modularity and maintainability. Updated imports, references, and build configurations accordingly.
ChunkerFixed can be configured to support files with a specific header size.
But we do not want to get an AssertionError if we encounter a 0-byte file
or a file that is shorter than the header size.
no options yet, just hardcoded macOS and Linux xattrs.
removed the --exclude-nodump option, it is also done automagically now.
also: create: call stat_ext_attrs early
this reads bsdflags, xattrs and ACLs from the
filesystem, except if the user chose to disable that.
notable:
- borg always reads these, even for unchanged files
- if we read them early, borg can now behave differently
based e.g. on a xattr value (and e.g. exclude the file)
we want to get rid of legacy stuff(*) one day and sha256 is as
good for this purpose (and might be even hw accelerated).
(*) considered legacy due to the way it gives the key to the
blake2b function (just padding and prepending it to the data,
instead of using the key parameter, see #8867 ).
Replaced inline file reading logic with `FileReader` to standardize handling across chunkers. Improved buffer updates and allocation handling for sparse files and optimized read operations.
Includes cases for simple reads, multiple reads, and mock chunk scenarios to verify behavior with mixed allocation types.
Also: change Chunk type for empty read result for better consistency.
Simplified and improved handling of mixed types of chunks during reading. The allocation type of resulting chunks is now determined based on contributing chunks.
The `header_size` parameter and related logic have been removed from file readers, simplifying their implementation. This change eliminates unnecessary complexity while maintaining all functional capabilities via `read_size` and `fmap`.
`FileFMAPReader` deals with sparse files (data vs holes) or fmap and yields blocks of some specific read_size using a generator.
`FileReader` uses the `FileFMAPReader` to fill an internal buffer and lets users use its `read` method to read arbitrary sized chunks from the buffer.
For both classes, instances now only deal with a single file.
Replaced `ChunkerFixed`'s block-reading functionality with a new `FileReader` class to streamline code and improve separation of concerns. Adjusted `ChunkerFixed` to delegate file reading to `FileReader` while focusing on chunk assembly.
`FileReader` is intended to be useful for other chunkers also, so they can easily implement sparse file reading / fmap support.
The `-Wno-unreachable-code-fallthrough` compiler flag suppresses warnings about fallthrough annotations in unreachable code.
In C switch statements, "fallthrough" occurs when execution continues from one case to the next without a break statement. This is often a source of bugs, so modern compilers warn about it. To indicate intentional fallthrough, developers use annotations like `__attribute__((fallthrough))`.
In Cython-generated C code, the `CYTHON_FALLTHROUGH` macro is defined to expand to the appropriate fallthrough annotation for the compiler being used. For example, in `compress.c`:
```c
#define CYTHON_FALLTHROUGH __attribute__((fallthrough))
```
The issue occurs because Cython generates code with conditional branches that may be unreachable on certain platforms or configurations. When these branches contain switch statements with fallthrough annotations, compilers like Clang issue warnings like:
```
warning: fallthrough annotation in unreachable code [-Wunreachable-code-fallthrough]
```
These warnings appear in the generated C code, not in the original Cython source. They're harmless but noisy, cluttering the build output with warnings about code we don't control.
By adding `-Wno-unreachable-code-fallthrough` to the compiler flags in `setup.py`, we specifically tell the compiler to ignore these particular warnings, resulting in a cleaner build output without affecting the actual functionality of the code.
This is a common practice when working with generated code - suppress specific warnings that are unavoidable due to the code generation process while keeping other useful warnings enabled.
Updated bash completions to include new commands such as `analyze`, `debug`, `repo-space`, `tag`, and `undelete`, along with their respective options. Fixed a typo in the `--upgrader` completions and improved completion handling for various commands.
thanks a lot to @sothix for helping with this!
removed pytest-forked, is not found anymore:
error: target not found: mingw-w64-ucrt-x86_64-python-pytest-forked
use a virtual env to avoid mixup of user with system packages.
remove old workaround for setuptools (SETUPTOOLS_USE_DISTUTILS: stdlib).
fix pip install
use --system-site-packages as a workaround for broken pip install python-cffi.
do not upgrade pip setuptools build wheel
use python -m pytest to use the one from the venv
Also: moved name length check to Archive.__init__, so it doesn't
read all other archives main metadata when creating a new archive.
In write-only mode, the files cache can't be built from the repo
from the latest archive of same series, we are not allowed to read that!
The posixfs borgstore backend implements permissions to make
testing with differently permissive stores easier.
The env var selects from pre-defined permission configurations
within borg and gives the chosen permissions config to borgstore.
Add incremental flag to `write_chunkindex_to_repo_cache`.
borg create uses incremental cache indexes to save progress.
But other OPs need to write a full index and delete all other cached indexes.
Added debug logging for missing object IDs.
Introduce tests to verify the functionality of the `repo-space` command, including space reservation, freeing, display, and edge cases. These tests ensure proper handling of various scenarios and validation of the respective outputs.
- borg repo-create and borg transfer not only support --repo / --other-repo options,
but also already supported related BORG_REPO and BORG_OTHER_REPO env vars.
- similar to that, the passphrases now come from BORG_[OTHER_]PASSPHRASE, BORG_[OTHER_]PASSCOMMAND or BORG_[OTHER_]PASSPHRASE_FD.
- borg repo-create --repo B --other-repo A does not silently copy the passphrase of key A
to key B anymore, but either asks for the passphrase or reads it from env vars.
Some features like append-only repositories rely on a server-side component
that enforces them (because that shall only be controllable server-side,
not client-side).
So, that can only work, if such a server-side component exists, which is the
case for borg 1.x ssh: repositories (but not for borg 1.x non-ssh: repositories).
For borg2, we currently have:
- fs repos
- sftp: repos
- rclone: repos (enabling many different cloud providers)
- s3/b3: repos
- ssh: repos using client/server rpc code similar as in borg 1.x
So, only for the last method we have a borg server-side process that could enforce some features, but not for any of the other repo types.
For append-only the current idea is that this should not be done within borg,
but solved by a missing repo object delete permission enforced by the storage.
borg create could then use credentials that miss permission to delete,
while borg compact would use credentials that include permission to delete.
Some features like repository quotas rely on a server-side component
that enforces them (because that shall only be controllable server-side,
not client-side).
So, that can only work, if such a server-side component exists, which is the
case for borg 1.x ssh: repositories (but not for borg 1.x non-ssh: repositories).
For borg2, we currently have:
- fs repos
- sftp: repos
- rclone: repos (enabling many different cloud providers)
- s3/b3: repos
- ssh: repos using client/server rpc code similar as in borg 1.x
So, only for the last method we have a borg server-side process that could enforce some features, but not for any of the other repo types.
For quotas the current idea is that this should not be done within borg,
but enforced by a storage specific quota implementation (like fs quota,
or quota of the cloud storage provider). borg could offer information
about overall repo space used, but would not enforce quotas within borg.
before this fix, borg also obfuscated other chunks it creates,
e.g. the archive metadata stream chunks, which is not necessary
and only added a lot of overhead.
as we have meta["type"] in borg2, this is easy to fix here.
It's easy enough to verify exhaustively for any plausible chunker params
that Padmé always produces at most a 12% overhead. Checking that again
at runtime is pointless.
This only happened when:
- using borg extract --numeric-ids
- processing NFS4 ACLs
It didn't affect POSIX ACL processing.
This is rather old code, so it looks like nobody used that
code or the bug was not reported.
The bug was discovered by PyCharm's "Junie" AI. \o/
Sometimes, usually for file content chunks, it makes sense to
generate all-zero replacement chunks on-the-fly.
But for e.g. an archive items metadata stream, this does not
make sense (because it wants to msgpack.unpack the data), so
we rather want None. In that case, we do not have the size
information anyway.
preloading: always use raise_missing=False, because
the behaviour is defined at preloading time.
fetch_many: use get_many with raise_missing=False.
if get_many yields None instead of the expected chunk
cdata bytes, on-the-fly create an all-zero replacement
chunk of the correct size (if the size is known) and
emit an error msg about the missing chunk id / size.
note: for borg recreate with re-chunking this is a bit
unpretty, because it will transform a missing chunk into
a zero bytes range in the target file in the recreated
archive. it will emit an error message at recreate time,
but afterwards the recreated archive will not "know"
about the problem any more and will just have that
zero-patched file.
so guess borg recreate with re-chunking should better
only be used on repos that do not miss chunks.
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.
transfer: do not transfer replacement chunks, deal with missing chunks in other_repo
FUSE fs read: IOError or all-zero result
fixes#8641
In the example, setting SYSTEMD_WANTS instead of appending may prevent
other autostart services attached by earlier udev rules from launching.
This commit changes = to += to fix this behavior.
fixes#8639
The priority of 40 for the udev rules as stated in to documentation
applies the rule too early on some systems, which prevents the rule from
matching. This commit changes the priority to 80.
Improve handling when defining a passphrase or debugging passphrase issues, fixes#8496
Setting `BORG_DEBUG_PASSPHRASE=YES` enables passphrase debug logging to stderr, showing passphrase, hex utf-8 byte sequence and related env vars if a wrong passphrase was encountered.
Setting `BORG_DISPLAY_PASSHRASE=YES` now always shows passphrase and its hex utf-8 byte sequence.
/Users/tw/w/borg/docs/internals/data-structures.rst:971:
WARNING: Lexing literal_block
'
[cache]
version = 1
repository = 3c4...e59
manifest = 10e...21c
timestamp = 2017-06-01T21:31:39.699514
key_type = 2
previous_location = /path/to/repo
[integrity]
manifest = 10e...21c
files = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
'
as "ini" resulted in an error at token: '}'.
Retrying in relaxed mode. [misc.highlighting_failure]
Note: this part of the docs didn't change for a long time, so I guess
the sudden warning comes from a change in sphinx' lexers.
Main problem is that rc != 0 will abort our CI pipeline.
see #8318
so long as it can be assumed that the user has configured a POSIX
compliant login shell, using a simple command [1] looks cleaner, as
no ``export`` or ``;`` are used.
[1] Section "2.9.1 Simple Commands" in volume "Shell & Utilities" of POSIX.1-2024
the python package pkgconfig does not need to be "preinstalled"
anymore, because our pyproject.toml cares for that. otoh, the cli tool
pkg-config must be preinstalled so that libs and headers can be found
automagically.
Also be a bit more clear about the FUSE stuff.
if retry is True, it will just retry to get a valid answer.
if retry is False, it will return the default.
the code can be tested by entering "error" (without the quotes).
It needs to be possible to iterate over all items in an archive,
do some output (e.g. if an item is included / excluded) and then
only preload content data chunks for the included items.
it looks like in brew they removed pkg-config formula and added
an alias to the pkgconf formula (which also provides a pkg-config
cli command).
the transition was not seamless:
on github actions CI:
Installing pkg-config
==> Downloading https://ghcr.io/v2/homebrew/core/pkgconf/manifests/2.3.0_1
==> Fetching pkgconf
==> Downloading https://ghcr.io/v2/homebrew/core/pkgconf/blobs/sha256:5f83615f295e78e593c767d84f3eddf61bfb0b849a1e6a5ea343506b30b2c620
==> Pouring pkgconf--2.3.0_1.arm64_sonoma.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /opt/homebrew
Could not symlink bin/pkg-config
Target /opt/homebrew/bin/pkg-config
is a symlink belonging to pkg-config@0.29.2. You can unlink it:
brew unlink pkg-config@0.29.2
To force the link and overwrite all conflicting files:
brew link --overwrite pkgconf
To list all files that would be deleted:
brew link --overwrite pkgconf --dry-run
Possible conflicting files are:
/opt/homebrew/bin/pkg-config -> /opt/homebrew/Cellar/pkg-config@0.29.2/0.29.2_3/bin/pkg-config
/opt/homebrew/share/aclocal/pkg.m4 -> /opt/homebrew/Cellar/pkg-config@0.29.2/0.29.2_3/share/aclocal/pkg.m4
/opt/homebrew/share/man/man1/pkg-config.1 -> /opt/homebrew/Cellar/pkg-config@0.29.2/0.29.2_3/share/man/man1/pkg-config.1
==> Summary
🍺 /opt/homebrew/Cellar/pkgconf/2.3.0_1: 27 files, 474KB
Installing pkg-config has failed!
`setup.py` hardcoded crypto library paths for OpenBSD, causing build
issue when OpenBSD drops specific OpenSSL version. Solution is to make
paths configurable.
Addresses #8553.
We do not want that urllib spoils test output with LibreSSL related
warnings on OpenBSD.
`NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently
the 'ssl' module is compiled with 'LibreSSL 3.8.2'`.
This should address #8506. Unfortunately I'm unable to test.
#8506 is likely caused by the Vagrant box having a mirror in its
`etc/installurl`, which does not offer 7.4 packages. There are other
mirrors out there who do, e.g., https://ftp.eu.openbsd.org/pub/OpenBSD/.
Proposed 'fix' is to replace the mirror in `/etc/installurl`.
Worst (but frequent) case here is that all or most of the chunks
in the repo need to get recompressed, thus storing all chunk ids
in a python list would need significant amounts of memory for
large repositories.
We already have all chunk ids stored in cache.chunks, so we now just
flag the ones needing re-compression by setting the F_COMPRESS flag
(that does not need any additional memory).
- ChunkIndex: implement system flags
- ChunkIndex: F_NEW flag as 1st system flag for newly added chunks
- incrementally write only NEW chunks to repo/cache/chunks.*
- merge all chunks.* when loading the ChunkIndex from the repo
Also: the cached ChunkIndex only has the chunk IDs. All values are just dummies.
The ChunkIndexEntry value can be used to set flags and track size, but we
intentionally do not persist flags and size to the cache.
The size information gets set when borg loads the files cache and "compresses"
the chunks lists in the files cache entries. After that, all chunks referenced
by the files cache will have a valid size as long as the ChunkIndex is in memory.
This is needed so that "uncompress" can work.
- doesn't need a separate file for the hash
- we can later write multiple partial chunkindexes to the cache
also:
add upgrade code that renames the cache from previous borg versions.
Consider soft-deleted archives/ directory entries, but only create a new
archives/ directory entry if:
- there is no entry for that archive ID
- there is no soft-deleted entry for that archive ID either
Support running with or without --repair.
Without --repair, it can be used to detect such inconsistencies and return with rc != 0.
--repository-only contradicts --find-lost-archives.
We are only interested in archive metadata objects here, thus for most repo objects
it is enough to read the repoobj's metadata and determine the object's type.
Only if it is the right type of object, we need to read the full object (metadata
and data).
This reverts commit d3f3082bf4.
Comment by jdchristensen:
I agree that "wipe clean" is correct grammar, but it doesn't match the situation in "unmount cleanly".
The change in this patch is definitely wrong.
Putting it another way, one would never say that we "clean unmount a filesystem".
We say that we "cleanly unmount a filesystem", or in other words, that it "unmounts cleanly".
But the original text is slightly awkward, so I would propose: "When running in the foreground,
^C/SIGINT cleanly unmounts the filesystem, but other signals or crashes do not."
(Not that this guarantees anything, but I'm a native speaker.)
We gave up refcounting quite a while ago and are only interested
in whether a chunk is used (referenced) or not (orphan).
So, let's keep that uint32_t value, but use it for bit flags, so
we could use it to efficiently remember other chunk-related stuff also.
If we have an entry for a chunk id in the ChunkIndex,
it means that this chunk exists in the repository.
The code was a bit over-complicated and used entry.refcount
only to detect whether .get(id, default) actually got something
from the ChunkIndex or used the provided default value.
The code does the same now, but in a simpler way.
Additionally, it checks for size consistency if a size is
provided by the caller and a size is already present in
the entry.
- refactor packing/unpacking of fc entries into separate functions
- instead of a chunks list entry being a tuple of 256bit id [bytes] and 32bit size [int],
only store a stable 32bit index into kv array of ChunkIndex (where we also have id and
size [and refcount]).
- only done in memory, the on-disk format has (id, size) tuples.
memory consumption (N = entry.chunks list element count, X = overhead for rest of entry):
- previously:
- packed = packb(dict(..., chunks=[(id1, size1), (id2, size2), ...]))
- packed size ~= X + N * (1 + (34 + 5)) Bytes
- now:
- packed = packb(dict(..., chunks=[ix1, ix2, ...]))
- packed size ~= X + N * 5 Bytes
on macOS, installing older Pythons seems to uninstall OpenSSL 3 and only 1.1 is left.
also, building all these pythons and misc. openssl versions takes forever and we
only need 3.12 for the binary build. testing on misc. python versions is regularly
done one github actions CI.
- remove more hashindex tests
- remove IndexBase, _hashindex.c remainders
- remove early borg2 NSIndex
- remove hashindex_variant (we only support borg 1.x repo index
aka NSIndex1, everything else uses the borghash based stuff)
- adapt code / tests so they use NSIndex1 (not NSIndex)
- minor fixes
- NSIndex1 can read the old borg 1.x on-disk format, but not write it.
- NSIndex1 can read/write the new borghash on-disk format.
- adapt legacyrepository code to work with NSIndex1 (segment, offset)
values instead of NSIndex (segment, offset, size).
- Mention zstd as the best general choice when not using lz4
(as often acknowledged by public benchmarks)
- Mention 'auto' more prominently as a good heuristic to improve
speed while retaining good compression
- Link to compression options
Also:
- remove most hashindex tests, borghash has such tests
- have a small wrapper class ChunkIndex around HashTableNT to
adapt API difference and add some special methods.
Note: I needed to manually copy the .pxd files from borghash
into cwd, because they were not found:
- ./borghash.pxd
- borghash/_borghash.pxd
There were still some relicts from pre-borgstore / borg 1.x in there:
- patterns about "::", used to be separator between repository and archive.
- patterns for //server/share (not supported by borgstore)
Also: unified ssh+sftp and file+socket processing.
special tags start with @ and have clobber protection,
so users can't accidentally remove them using borg tag --set.
it is possible though to still use --set, but one must also
give all special tags that the archive(s) already have.
there is only a known set of allowed special tags:
@PROT - protects archives against archive pruning or archive deletion
setting unknown tags beginning with @ is disallowed.
as borg now uses repository.store_load and .store_save to load
and save the chunks cache, we need a rather high limit here.
this is a quick fix, the real fix might be using chunks of the
data (preferably <= MAX_OBJECT_SIZE), so there is less to unpack
at once.
Read or modify this set, only add validated str to it:
Archive.tags: Optional[set[str]]
borg info [--json] <archive> displays a list of comma-separated archive tags (currently always empty).
borg 1.x encouraged users to put everything into the archive name:
- name of the dataset
- timestamp (usually used to make the archive name unique)
- maybe also hostname (when backing up to same repo from multiple hosts)
- maybe also username (when backing up to same repo from multiple users)
borg2 now discourages users from putting the timestamp into the name,
because we rather want same name within a series of archives - thus,
the field width for the name can be narrower.
the ID of the archive is now the only unique identifier, thus it is
moved to the leftmost place.
256bits (64 hex digits) was a bit much and as borg can also deal with
abbreviated IDs, we only show 32bits (8 hex digits) by default.
the ID is followed by the timestamp (also quite "interesting", because
it usually differs for different archives).
then following are: archive name, user name, host name - these might be
always the same if there is only one series of archives in a repo.
use 2 blanks separating the fields for better readability.
Needed to change this because listing just the
archive names is pretty useless if names are not
unique.
The short list is likely mostly used by scripts to
iterate over all archives, so outputting IDs is
better.
Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.
We can also detect that we are finished, if .list()
returns less than the limit we gave to it.
Also: reduce code duplication by using repo_lister func.
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.
When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.
borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.
cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.
cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.
so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
if we detect the conditions for this (rare) race,
abort reading the file and retry.
The caller (_process_any) will do up to MAX_RETRIES
before giving up. If it gives up, a warning is logged
and the file is not written to the archive and won't
be memorized in the files cache either.
Thus, the file will be read/chunked/hashed again at
the next borg create run.
- on explicit request, update .last_refresh_dt inside _create_lock / _delete_lock
- reset .last_refresh_dt if we kill our own lock
- be more precise, have exactly the datetime of the lock in .last_refresh_dt
- cosmetic: do refresh/stale time comparisons always in the same way
- changes to locally stored files cache:
- store as files.<H(archive_name)>
- user can manually control suffix via env var
- if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
archive's metadata from the repo (better than starting with
empty files cache and needing to read/chunk/hash all files).
previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
repo from same machine/user).
Cache entries now have ctime AND mtime.
Note: TTL and age still needed for discarding removed files.
But due to the separate files caches per series, the TTL
was lowered to 2 (from 20).
repository.list is slow, so rather use the chunkindex,
which might be cached in future. currently, it also uses
repository.list, but at least we can solve the problem
at one place then.
under all circumstances, we must avoid that the lock
gets stale due to not being refreshed in time.
there is some internal rate limiting in _lock_refresh,
so calling it often should be no problem.
in borg 1.x, we used to put a timestamp into the archive name to make
it unique, because borg1 required that.
borg2 does not require unique archive names, but it encourages you
to even use an identical archive name within the same SERIES of archives.
that makes matching (e.g. for prune, but also at other places) much
simpler and borg KNOWS which archives belong to the same series.
for the archives directory, we only need to know the archive IDs,
everything else can be fetched from the ArchiveItem in the repo.
so we store empty files into archives/* with the archiv ID as name.
this makes some "by-id" operations much easier and we don't have to
deal with a useless "store_key" anymore.
removed .delete method - we can't delete by name anymore as we
allow duplicate names for the series feature. everything uses
delete_by_id() now.
also: simplify, clean up, refactor
- we should always output name and id when talking about an archive
- no problem anymore if names in archives directory are "duplicate"
- use "by-id" archives directory entry delete function
- rewrite/simplify test for borg check --undelete-archives
so if one works with backup series, one can just do:
borg prune --keep-daily 30 seriesname
seriesname will then do a precise match on the archive names
and select that series.
aid:<archive-id-prefix> can be used for -a / --match-archives
to match on the archive id (prefix) instead of the name.
NAME positional argument now also supports matching (and aid:),
but requires that there is exactly ONE result.
macOS and Linux give EISDIR, while Windows gives EPERM when trying to
open a file for writing, if the filename is already taken by an existing
directory.
now all OSes should give the same RC in this case.
borg delete and borg prune do a quick and dirty archive deletion,
just removing the archives directory entry for them.
--undelete-archives can still find the archive metadata objects
by completely scanning the repository and re-create missing
archives directory entries.
but only until borg compact would remove all unused data.
if only the manifest is missing or corrupted, do not run that
scan, it is not required for the manifest anymore.
if the manifest file is missing, check generated *.1 *.2 ... archives although an entry for the correct name and id was already
present. BUG!
this is because if the manifest is lost, that does not imply
anymore that the complete archives directory is also lost, as it
did in borg 1.x.
Also improved log messages a bit.
not for check and compact, these need an exclusive lock.
to try parallel repo access on same machine, same user,
one needs to use a non-locking cache implementation:
export BORG_CACHE_IMPL=adhoc
this is slow due the missing files cache in that implementation,
but unproblematic because no caches/indexes are persisted.
old borg just didn't commit the transaction and
thus caused a transaction rollback if not in
repair mode.
we can't do that anymore, thus we must avoid
modifying the repo if not in repair mode.
previously, borg always read all archives entries, modified the
list in memory, wrote back to the repository (similar as borg 1.x
did).
now borg works directly with archives/* in the borgstore.
otherwise the lock might become stale and could get
killed by any other borg process.
note: ThreadRunner class written by PyCharm AI and
only needed small enhancements. nice.
reuse_chunk is the complement of add_chunk for already existing chunks.
It doesn't do refcounting anymore.
.seen_chunk does not return the refcount anymore, but just whether the chunk exists.
If we add a new chunk, it immediately sets its refcount to MAX_VALUE, so
there is no difference anymore between previously existing chunks and new
chunks added. This makes the stats even more useless, but we have less complexity.
.init_chunks has just built self.chunks using repository.list(), so don't
call that again, but just iterate over self.chunks.
also some other changes, making the code much simpler.
When the AdhocCache(WithFiles) queries chunk IDs from the repo to build the chunks
index, it won't know their refcount and thus all chunks in the index have their
refcount at the MAX_VALUE (representing "infinite") and that would never decrease
nor could that ever reach zero and get the chunk deleted from the repo.
Only completely new chunks first written in the current borg run have a valid
refcount.
In some exception handlers, borg tried to clean up chunks that won't be used
by an item by decref'ing them. That is either:
- pointless due to refcount being at MAX_VALUE
- inefficient, because the user might retry the backup and would need to
transmit these chunks to the repo again.
We'll just rely on borg compact ONLY to clean up any unused/orphan chunks.
borg1 needed this due to its transactional / rollback behaviour:
if there was uncommitted stuff in the repo, next repo opening automatically
rolled back to last commit. thus we needed checkpoint archives to reference
chunks and commit the repo.
borg2 does not do that anymore, unused chunks are only removed when the
user invokes borg compact.
thus, if a borg create gets interrupted, the user can just run borg create
again and it will find some chunks are already in the repo, making progress
even if borg create gets frequently interrupted.
This was an implementation specific "in on-disk order" list method that made sense
with borg 1.x log-like segment files only.
But we now store objects separately, so there is no "in on-disk order" anymore.
This was used for an implementation detail of the borg 1.x
repository code, dumping uncommitted objects. Not needed any more.
Also remove local repository method scan_low_level, it was only used by --ghost.
Tests were a bit tricky as there is validation on 2 layers now:
- repository3 does an xxh64 check, finds most corruptions already
- on the archives level, borg also does an even stronger cryptographic check
Dummy returns all-zero stats from that call.
Problem was that these values can't be computed from the chunks cache
anymore. No correct refcounts, often no size information.
Also removed hashindex.ChunkIndex.summarize (previously used by the above mentioned
.stats() call) and .stats_against (unused) for same reason.
Lots of low-level code written back then to optimize runtime of some
functions.
We'll solve this differently by doing less stats, esp. if it is expensive to compute.
Note: this is the default cache implementation in borg 1.x,
it worked well, but there were some issues:
- if the local chunks cache got out of sync with the repository,
it needed an expensive rebuild from the infos in all archives.
- to optimize that, a local chunks.archive.d cache was used to
speed that up, but at the price of quite significant space needs.
AdhocCacheWithFiles replaced this with a non-persistent chunks cache,
requesting all chunkids from the repository to initialize a simplified
non-persistent chunks index, that does not do real refcounting and also
initially does not have size information for pre-existing chunks.
We want to move away from precise refcounting, LocalCache needs to die.
much faster and easier now, similar to what borg delete --force --force used to do.
considering that speed, no need for checkpointing anymore.
--stats does not work that way, thus it was removed. borg compact now shows some stats.
Features:
- exclusive and non-exclusive locks
- acquire timeout
- lock auto-expiry (after 30mins of inactivity), lock refresh
- use tz-aware datetimes (in utc timezone) in locks
Also:
- document lock acquisition rules in the src
- increased default BORG_LOCK_WAIT to 10s
- better document with-lock test
Stale locks are ignored and automatically deleted.
Default: stale == 30 Minutes old.
lock.refresh() can be called frequently to avoid that an acquired lock becomes stale.
It does not do much if the last real refresh was recently.
After stale/2 time it checks and refreshes the locks in the store.
Update the repository3 code to call refresh frequently:
- get/put/list/scan
- inside check loop
borg transfer is primarily a general purpose archive transfer function
from borg2 to related borg2 repos.
but for upgrades from borg 1.x, we also need to support:
- rcreate with a borg 1.x "other repo"
- transfer with a borg 1.x "other repo"
It uses xxh64 hashes of the meta and data parts to verify their validity.
On a server with borg, this can be done server-side without the borg key.
The new RepoObj header has meta_size, data_size, meta_hash and data_hash.
Simplify the repository a lot:
No repository transactions, no log-like appending, no append-only, no segments,
just using a key/value store for the individual chunks.
No locking yet.
Also:
mypy: ignore missing import
there are no library stubs for borgstore yet, so mypy errors without that option.
pyproject.toml: install borgstore directly from github
There is no pypi release yet.
use pip install -e . rather than python setup.py develop
The latter is deprecated and had issues installing the "borgstore from github" dependency.
test the healing more thoroughly:
- preservation of correct chunks list in .chunks_healthy
- check that .chunks_healthy is removed after healing
- check that doing another borg check --repair run does not find
something to heal, again.
also did a datatype consistency fix for item.chunks_healthy list
members: they are now post processed in the same way as item.chunks,
so they have type ChunkListEntry rather than simple tuple.
it needs to be like this to support a ~/.pypirc like this,
containing a separate upload token for the borgbackup project:
[distutils]
index-servers =
borgbackup
...
[borgbackup]
repository = https://upload.pypi.org/legacy/
username = __token__
password = pypi-...(token)...
Also: support a "cli" env var value, that does not determine
the implementation from the env var, but rather from cli options (similar to as it was before adding BORG_CACHE_IMPL).
- skip test_cache_chunks if there is no persistent chunks cache file
- init self.chunks for AdHocCache
- remove warning output from AdHocCache.__init__, it gets mixed with JSON output and fails the JSON decoder.
Add new borg create option '--prefer-adhoc-cache' to prefer the
AdHocCache over the NewCache implementation.
Adjust a test to match the previous default behaviour (== use the
AdHocCache) with --no-cache-sync.
removed some code borg had for backwards compatibility with
old borg versions (that had timestamp only in the cache).
now the manifest timestamp is only checked against the manifest-timestamp
file in the security dir, simplifying the code.
removed some code borg had for backwards compatibility with
old borg versions (that had key_type only in the cache).
now the repo key_type is only checked against the key-type
file in the security dir, simplifying the code.
removed some code borg had for backwards compatibility with
old borg versions (that had previous_location only in the
cache).
now the repo location is only checked against the location
file in the security dir, simplifying the code and also
fixing a related test failure with NewCache.
also improved test_repository_move to test for aborting in
case the repo location changed unexpectedly.
NewCache does not do precise refcounting, thus chunks won't be deleted
from the repo at "borg delete" time.
"borg check --repair" would remove such chunks IF they are orphans.
if we use AdHocCache or NewCache, we do not have precise refcounting.
thus, we do not delete repo objects as their refcount does not go to zero.
check --repair will just remove the orphans.
incref: returns (id, size), so it needs the size if it can't
get it from the chunks index. also needed for updating stats.
decref: caller does not always have the chunk size (e.g. for
metadata chunks),
as we consider 0 to be an invalid size, we call with size == 1
in that case. thus, stats might be slightly off.
the files cache used to have only the chunk ids,
so it had to rely on the chunks index having the
size information - which is problematic with e.g.
the AdhocCache (has size==0 for all not new chunks) and blocked using the files cache there.
Try to rebuild cache if an exception is raised, fixes#5213
For now, we catch FileNotFoundError and FileIntegrityError.
Write cache config without manifest to prevent override of manifest_id.
This is needed in order to have an empty manifest_id.
This empty id triggers the re-syncing of the chunks cache by calling sync() inside LocalCache.__init__()
Adapt and extend test_cache_chunks to new behaviour:
- a cache wipe is expected now.
- borg detects the corrupt cache and wipes/rebuilds the cache.
- check if the in-memory and on-disk cache is as expected (a rebuilt chunks cache).
That "failed to map segment from shared object" error msg is not
very helpful. Add a hint that the filesystem needs to be +exec
(== not noexec mounted, like it might be the case for /tmp on
some systems).
Looks like borg's setup.py has hidden the real cause of a cythonize ImportError.
There are basically 2 cases:
- either there is no Cython installed, then the import fails because the module can not be found, or
- there is some issue within Cython and the import fails due to that.
It's important not to hide the real cause, especially if we run into case 2.
case 1 is kind of expected and frequent, case 2 is rare.
Previously:
- acl_get just returned for lpathconf returning EINVAL
- acl_get silently ignored all other lpathconf errors and
implied it is not a NFS4 acl
Now:
- not sure why the EINVAL silent return was done, but it seems
wrong. guess it could be the system not implementing a check
for nfs4. but in that case guess we still would like to get
the default and access ACL!? Thus, I removed the silent return.
- raise OSError for all lpathconf errors
Cosmetic: add a nfs4_acl boolean, so the code reads better.
... to implement same semantics as on linux (only store ACL
if it defines permissions other than those defined by the
traditional file permissions).
Looks like there is no call working with an fd on FreeBSD.
This is NOT a bug fix, because the previous code contained a
check for symlinks before that line - because symlinks can not
have ACLs under Linux.
Now, this "is it a symlink" check is removed to simplify the
code and the "nofollow" variant of acl_extended_file* is used
to look at the symlink fs object (in the symlink case).
It then should tell us that this does NOT have an extended ACL
(because symlinks can't have ACLs) and so we return there.
Overall the code gets simpler and looks less suspect.
Previously, these conditions were handled the same (just return):
- no extended acl here
- some error happened (e.g. ACLs unsupported, bad file descriptor, file not found, permission error, ...)
Now there will be OSErrors for the error cases.
- ACLs are not working, if ENOTSUP ("Operation not supported") happens
- fix check for macOS
On macOS borg uses "acl_extended", not "acl_access" and
also the ACL text format is a bit different.
- macOS: run on macos-14 (on Apple Silicon!)
- macOS: use OpenSSL 3.0 from brew
- macOS: run with Python 3.11
- pip install -e .: add -v
- use up-to-date github actions
- remove libb2 references - since borg 1.2, we use blake2 indirectly via python stdlib
this was recently set to a relatively high minimum version when
locating it via pkgconfig was added. this broke the binary builds
on buster and bullseye.
i don't think borg requires a specific libacl version as long as
the api is compatible, so i now set this to 2.2.47 (from 2008).
borg init calls this. If there is a PermissionError, it is
usually fs permission issue at path or its parent directory.
Don't give a traceback, but rather an error msg and a specific exit code.
this is a fwd port from 1.4-maint. as we don't have nonce files
any more in master, only the generally useful stuff has been ported.
- add Error / ErrorWithTraceback exception classes to RPC layer.
- add hex_to_bin helper
if we do multiple calls to Archiver.do_something(),
we need to reset the ec / warnings after each call,
otherwise they will keep growing (in severity, in length).
stop directly accessing the variables from other modules.
prefix with underscore to indicate that these shall
only be used within this module and every other user
shall call the respective functions.
this is not needed and getting rid of it makes
the code / behaviour simpler to understand:
if a fatal error is detected, we throw an exception.
if we encounter something warning worthy, we emit and collect the warning.
in a few cases, we directly call set_ec to set the
exit code as needed, e.g. if passing it through
from a subprocess.
also:
- get rid of Archiver.exit_code
- assert that return value of archiver methods is None
- fix a print_warning call to use the correct formatting method
- implement updating exit code based on severity, including modern codes
- extend print_warning with kwargs wc (warning code) and wt (warning type)
- update a global warnings_list with warning_info elements
- create a class hierarchy below BorgWarning class similar to Error class
- diff: change harmless warnings about speed to rc == 0
- delete --force --force: change harmless warnings to rc == 0
Also:
- have BackupRaceConditionError as a more precise subclass of BackupError
previously, this was handled in RPCError handler and always resulted in rc 2.
now re-raise Lock Exceptions locally, so it gives rc 2 (legacy) or 7x (modern).
If not set, it will default to "legacy" (always return 2 for errors).
This commit only changes the Error exception class and its subclasses.
The more specific exit codes need to be defined via .exit_mcode in the subclasses.
Also: use ERROR loglevel for these (not WARNING).
A different amount of index entries was already logged as error
and led to "error_found = True" in repository.check.
Different values in the rebuilt index vs. the on-disk index were
only logged on warning level, but did not lead to error_found = True.
Guess there is no reason why these should not be errors and lead to
error_found = True, so this was fixed in this commit.
Minor related change: change report_error function args, so it can be
called like logger.error - including giving a format AND args.
the netbsd vagrant machine tends to segfault, guess due to some kernel or virtualbox issue.
thus, rather only do 1 tox run, so there is less output to review.
there are multiple issues with that box:
- debian 9 is out of support by debian, out of even lts support since 2022
- it has a OpenSSL 1.x natively (and our source based install also used 1.x) - that is also out of support and noone will care for it.
Also, borg2 will still take a while, so it would be
even more outdated at release time as it already
is now.
The intention of LockRoster.modify(key, REMOVE) is to remove self.id.
Using set.discard will just ignore it if self.id is not present there anymore.
Previously, using set.remove triggered a KeyError that has been frequently
seen in tracebacks of teardowns involving Repository.__del__ and Repository.__exit__.
I added a REMOVE2 op to serve one caller that needs to get the KeyError if
self.id was not present.
Thanks to @herrmanntom for the workaround!
When borg invokes a system command, it needs to prepare the environment
for that. This is especially important when using a pyinstaller-made
borg fat binary that works with a modified env var LD_LIBRARY_PATH -
system commands may crash with that.
borg already had calls to prepare_subprocess_env at some places (e.g.
when invoking ssh for the remote repo connection), but they were
missing for:
borg create --content-from-command ...
borg create --paths-from-command ...
before this fix, borg check --repair just created an
empty shadow index, which can lead to incomplete
entries if entries are added later.
and such incomplete (but present) entries can lead to
compact_segments() resurrecting old PUTs by accidentally
dropping related DELs.
get_args() exception handling before this fix only dealt with
subclasses of "Error", but we have to expect other exceptions
there, too.
In any case, if we have some fatal exception here, we must
terminate with rc 2.
ArgumentTypeError: emit a short error message - usually this is
a user error, invoking borg in a wrong way.
Other exceptions: full info and traceback.
for the other compression methods, this is done in
the base class, but the zlib legacy does not call
that method as it also removes the header bytes,
which zlib legacy does not have.
Hint for Cygwin users to make sure they use a virtual environment.
Not using a virtual environment will be likely troublesome if there is already a Python installed on Windows.
also: do a small optimisation in borg check:
if the type of the repo object is not ROBJ_ARCHIVE_META, we
can skip the object, it can not contain valid archive meta data.
if the type is correct, this is already a sufficient check, so
we can be quite sure that there will be valid archive metadata
in the object.
writing: put type into repoobj metadata
reading: check wanted type against type we got
repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.
a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).
also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
authentication/decryption would fail on mismatch.
- the type check would fail.
thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
For many use cases, the repo-wide "rcompress" is more efficient.
Also, recreate --recompress calls add_chunk with overwrite=True,
which is unsupported with the AdHocCache.
twine is only needed at release time, no need
for all developers or all test runs to install
this.
also, some requirement of twine needs a rust
compiler, so if there is no rust compiler,
automated runs will abort due to that.
remove a lot of complexity from the code that was just there to
support legacy borg versions < 1.0.9 which did not TAM authenticate
the manifest.
since then, borg writes TAM authentication to the manifest,
even if the repo is unencrypted.
if the repo is unencrypted, it did not check the somehow pointless
authentication that was generated without any secret, but
if we add that fake TAM, we can also verify the fake TAM.
if somebody explicitly switches off all crypto, they can not
expect authentication.
for everybody else, borg now always generates the TAM and also
verifies it.
rebuild_refcounts verifies and recreates the TAM.
Now it re-uses the salt, so that the archive ID does not change
just because of a new salt if the archive has still the same data.
list: shows either "verified" or "none", depending on
whether a TAM auth tag could be verified or was
missing (old archives from borg < 1.0.9).
when loading an archive, we now try to verify the archive
TAM, but we do not require it. people might still have
old archives in their repos and we want to be able to
list such repos without fatal exceptions.
This part of the archive checker recreates the Archive
items (always, just in case some missing chunks needed
repairing).
When loading the Archive item, we now verify the TAM.
When saving the (potentially modified) Archive item,
we now (re-)generate the TAM.
Archives without a valid TAM are dropped rather than TAM-authenticated
when saving them. There shouldn't be any archives without a valid TAM:
- borg writes an archive TAM since long (1.0.9)
- users are expected to TAM-authenticate archives created
by older borg when upgrading to borg 1.2.5.
Also:
Archive.set_meta: TAM-authenticate new archive
This is also used by Archive.rename and .recreate.
In these tests, we only compare paths, but we do not
need to create these paths for that. By not trying to
create them, we can avoid permission issues, e.g. under
fakeroot.
- master branch has different free space requirements from 1.2-maint,
so we now use a 700MB filesystem
- used pytest.mark.parametrize for the test passes, kind of a progress
display
- fix bug in rcreate call, encryption arg is needed
- fix bug in lock file cleanup
- added repo space cleanup
- updated docstring with current linux instructions (ubuntu)
- stopped using the "reserved" files, the "input" files are good enough
to get some space freed.
-
This is an emergency workaround for authenticated repos
if the user has lost the borg key.
We can't compute the TAM key without the borg key, so just
skip all the TAM stuff.
A borgbackup-2.0.0b6 test fails on OpenBSD with the message below.
```
=================================== FAILURES ===================================
_____________________________ test_get_runtime_dir _____________________________
path = '/run/user/55/borg', mode = 511, pretty_deadly = True
def ensure_dir(path, mode=stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO, pretty_deadly=True):
"""
Ensures that the dir exists with the right permissions.
1) Make sure the directory exists in a race-free operation
2) If mode is not None and the directory has been created, give the right
permissions to the leaf directory. The current umask value is masked out first.
3) If pretty_deadly is True, catch exceptions, reraise them with a pretty
message.
Returns if the directory has been created and has the right permissions,
An exception otherwise. If a deadly exception happened it is reraised.
"""
try:
> os.makedirs(path, mode=mode, exist_ok=True)
build/lib.openbsd-7.3-amd64-cpython-310/borg/helpers/fs.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
```
If `$XDG_RUNTIME_DIR` is not set `platformdirs.user_runtime_dir()`
returns one of 3 different paths
(https://github.com/platformdirs/platformdirs/pull/201). Proposed fix is
to check if `get_runtime_dir()` returns one of these paths.
last coala release (0.11.0) is now over 6y old.
when using pip install coala, a ton of stuff gets installed (expected)
and a part of that downgrades some stuff we use to outdated, incompatible
versions.
when trying to run coala with python 3.11, it just crashes because the
last release was made for py35/py36 (as seen in their setup.py).
a lot of PRs and tickets pile up at the coala project on github,
but noone is maintaining it.
macFUSE supports a volname mount option to give what
finder displays on desktop / in directory list.
if the user did not specify it, we make something up,
because otherwise it would be "macFUSE Volume 0 (Python)".
Move the explanation below the general explanation of the `--keep-*` option
behavior rephrase the last sentence to make it clear that it works like the
other options that were explained in the previous paragraph.
Resolves#7687
- pattern needs to start with + - !
- first match wins
- the default is to list everything, thus a 2nd pattern
is needed to exclude everything not matched by 1st pattern.
about 10-50% of the github windows CI runs fail due to
this - root cause unknown.
Example failure:
# we first check if we could create a sparse input file:
sparse_support = is_sparse(filename, total_size, hole_size)
if sparse_support:
# we could create a sparse input file, so creating a backup of it and
# extracting it again (as sparse) should also work:
self.cmd(f"--repo={self.repository_location}", "rcreate", RK_ENCRYPTION)
self.cmd(f"--repo={self.repository_location}", "create", "test", "input")
with changedir(self.output_path):
self.cmd(f"--repo={self.repository_location}", "extract", "test", "--sparse")
self.assert_dirs_equal("input", "output/input")
filename = os.path.join(self.output_path, "input", "sparse")
with open(filename, "rb") as fd:
# check if file contents are as expected
> self.assert_equal(fd.read(hole_size), b"\0" * hole_size)
E AssertionError: b'\x0[8388602 chars]x00\xf0Y\xb5\xe3\xee\xf3\x1f\xe3L\xcf\xae\x92\[159253621 chars]\x00' != b'\x0[8388602 chars]x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0[159383505 chars]\x00'
src/borg/testsuite/archiver/extract_cmd.py:212: AssertionError
Replacing the internals should make the implementation faster
and simpler since the order tracking is done by the `OrderedDict`.
Furthermore, this commit adds type hints to `LRUCache` and
renames the `upd` method to `replace` to make its use more clear.
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:
$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root root 7 Sun, 2022-10-23 19:14:27 x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'
Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.
This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.
With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.
I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:
a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.
Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.
This also (partially) fixes#7099.
shutting down logging is problematic as it is global
and we do multi-threaded execution, e.g. in tests.
thus, rather just flush the important loggers and keep
them alive.
server (listening) side:
borg serve --socket # default location
borg serve --socket=/path/to/socket
client side:
borg -r socket:///path/to/repo create ...
borg --socket=/path/to/socket -r socket:///path/to/repo ...
served connections:
- for ssh: proto: one connection
- for socket: proto: many connections (one after the other)
The socket has user and group permissions (770).
skip socket tests on win32, they hang infinitely, until
github CI terminates them after 60 minutes.
socket tests: use unique socket name
don't use the standard / default socket name, otherwise tests
running in parallel would interfere with each other by using
the same socket / the same borg serve process.
write a .pid file, clean up .pid and .sock file at exit
add stderr print for accepted/finished socket connection
- tears down logging (so no new log output is generated afterwards)
- sends all queued log output
- then returns
also: make stdin_fd / stdout_fd instance variables
for normal borg command invocation:
- logging is set up in Archiver.run
- the atexit handler calls logging.shutdown when process terminates
for tests:
- Archiver.run called by exec_cmd
- no atexit handler executed as process lives on
- borg.logger.teardown (calls shutdown and configured=False) now
called in exec_cmd
- simplify progress output (no \r, no terminal size related tweaks)
- emit progress output via the logging system (so it does not use stderr
of borg serve)
- progress code always logs a json string, the json has all needed
to either do json log output or plain text log output.
- use formatters to generate plain or json output from that.
- clean up setup_logging
- use a StderrHandler that always uses the **current** sys.stderr
- tweak TestPassphrase to not accidentally trigger just because of seeing 12 in output
Instead, install a handler that sends the LogRecord dicts to a queue.
That queue is then emptied in the borg serve main loop and
the LogRecords are sent msgpacked via stdout to the client,
similar to the RPC results.
On the client side, the LogRecords are recreated from the
received dicts and fed into the clientside logging system.
As we use msgpacked LogRecord dicts, we don't need JSON for
this purpose on the borg serve side any more.
On the client side, the LogRecords will then be either formatted
as normal text or as JSON log output (by the clientside log
formatter).
Compact moves data to new segments, and then removes the old segments.
When enough segments are moved, directories holding the now cleared segments
may thus become empty.
With this commit any empty directories are cleared after segments compacting.
Fixes#6823
+ os.scandir instead of os.listdir
Improved speed and added flexibility with attributes (name, path, is_dir(), is_file())
+ use is_dir / is_file to make sure we're reading only dirs / files respectively
+ Filtering to particular start, end index range built in
+ Move value bounds of segment (index) into constants module and use them instead
Resolves#7597
(forward patch from commits c9f35a16e9bf9e7073c486553177cef79ff1cb06^..edb5e749f512b7737b6933e13b7e61fefcd17bcb)
this used to call get_base_dir (and would have needed
legacy=True now to work like expected).
rather implemented the desired behaviour locally and
got rid of the legacy call (which was a bit strange
anyway as it also considered BORG_BASE_DIR, which is
unexpected when resolving ~).
in the sysinfo function, there is a way to suppress
all sysinfo output via an env var and just return an
empty string.
so we can expect it is always in unpacked, but it
might be the empty string.
log output:
always expect json, remove $LOG format support.
we keep limited support for unstructured format also,
just not to lose anything from remote stderr.
rpc format:
ancient borg used tuples in the rpc protocol,
but recent ones use easier-to-work-with dicts.
version info:
we expect dicts with server/client version now.
That means I won't make new 1.1.x releases.
In case there would be a major security or other issue,
I might still make a fix commit to the 1.1-maint branch,
where dist package maintainers or other interested
parties could find it.
this ports change 73ee704afa to master.
setUp enters the context manager, so let's .reopen() leave it.
then create a fresh Repository instance in self.repository and
enter the context manager again. tearDown then will leave that.
"if self.repository" did not work as expected:
- Repository has a __len__ method, so the boolean evaluation was calling that.
- self.repository is also not set to None anywhere.
while on macOS the new and old security dir location is the same path,
this is not the case on e.g. Linux, it could move from .config/borg/security to
.local/share/borg/security .
See #5760.
at some places, the docs were not updated yet.
for borg 1.x, -a (aka --glob-archives) expected
sh: style glob patterns ONLY (but one must not
give sh: explicitly).
for borg 2, -a (aka --match-archives) defaults
to id: style (identical match), so one must give
sh: if one wants shell-style globbing.
not needed for borg2 repos (we derive a new session key for each borg
invocation and start counting from 0).
also not needed for borg 1.x repos because we only read them (borg transfer)
and won't write new encrypted data to them.
use this to only list the kept (or pruned) archives.
--list-pruned and --list-kept also work in a additive way.
implied logging: support multiple prune options activating same logger
if any of --list / --list-kept / --list-pruned is used,
it should put the borg.output.list logger to INFO level,
otherwise to WARN level.
as a first step, i moved all the traceback formatting
to format_tb.
also, it **first** prints the error and then the traceback
as additional information for a bug report, as suggested
by @jimparis in that ticket.
saying "must be a writable directory" can distract
from the real root cause as seen in #7496.
so we better first check if the mountpoint is an
existing directory and if not, just tell that.
after that, we check permissions and if they are not
like required, tell that.
fix config dir compatibility issue, fixes#7445
- add tests
- make sure the result of get_cache_dir matches pre and post #7300 where desired
- harmonize implementation of config_dir_compat and cache_dir_compat tests
Co-authored-by: nain <126972030+F49FF806@users.noreply.github.com>
this needs to decompress and to hash the chunk data,
but better let's play safe.
at least we still can avoid the (re-)compression with
borg transfer (which is often much more expensive
than decompression).
The "Building a development environment" section links to the
"Using git" section. This can result in developers overseeing
the os dependencies necessity.
re #7356
diff: include changes in ctime and mtime, fixes#7248
also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes
Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
this is an incompatible change:
before:
borg debug put-obj path1 path2 ...
(and borg computed all IDs automatically) (*)
after:
borg debug put-obj id path
(id must be given)
(*) the code just using sha256(data) was outdated and incorrect anyway.
also: debug get-obj: improve error handling
Errors handled for backup src files:
- BackupOSError (converted from OSError), e.g. I/O Error
- BackupError (stats race, file changed while we backed it up)
Error Handling:
- retry the same file after some sleep time
- sleep time starts from 1ms, increases exponentially up to 10s
- 10 tries
If retrying does not help:
- BackupOSError: skip the file, log it with "E" status
- BackupError: last try will back it up, log it with "C" status
Works for:
- borg create's normal (builtin) fs recursion
- borg create --paths-from-command
- borg create --paths-from-stdin
Notes:
- update stats.files_stats late (so we don't get wrong
stats in case of e.g. IOErrors while reading the file).
- _process_any: no changes to the big block, just indented
for adding the retry loop and the try/except.
- test_create_erroneous_file succeeds because we retry the file.
we do book-keeping in item.chunks:
in case something goes wrong and we need to clean up,
we will have a list with chunks to decref in item.chunks.
also:
- make variable naming more consistent
- cosmetic changes
if a file can't be read (like here: there is a simulated
I/O error in the 2nd chunk of file2), it should be logged
with "E" status, skipped and backup shall proceed with
next file(s).
also, check that the repo has no orphan chunks (exception
handling code needs to deal with 1st chunk of file2 which
already has been written / incref'd in the repo).
--chunker-params=fail,4096,rrrEErrrr means:
- cut chunks of 4096b fixed size (last chunk in a file can be less)
- read chunks 0, 1 and 2 successfully
- error at chunk 3 and 4 (simulated OSError(errno.EIO))
- read successfully again for the next 4 chunks
Chunks are counted inside the chunker instance, starting
from 0, always increasing while the same instance is used.
Read chunks as well as failed chunks count up by 1.
also add a test: recreate without --chunker-params shall not rechunk
before the fix, it triggered rechunking if an archive
was created with non-default chunker params.
but it only should rechunk if borg recreate is invoked with explicitly giving --chunker-params=....
test hashtable expansion/rebuild.
hashindex_lookup:
- return -2 for a compact / completely full hashtable
- return -1 and (via start_idx pointer) the deleted/tombstone bucket index.
fix size assertion (we add 1 element to trigger rebuild)
fix upper_limit check - since we'll be adding 1 to num_entries below,
the condition should be >=:
hashindex_compact: set min_empty/upper_limit
Co-authored-by: Dan Christensen <jdc+github@uwo.ca>
hashindex_index returns the perfect hashtable index, but does not
check what's in the bucket there, so we had these loops afterwards
to search for an empty or deleted bucket.
problem: if the HT were completely filled with no empty and no deleted
buckets, that loop would never end. due to our HT resizing, it can
never happen, but still not pretty.
when using hashindex_lookup (as also used some lines above), the code
is easier to understand, because (after we resized the HT), we freshly
create the same situation as after the first call of that function:
- return value < 0, because we (still) can not find the key
- start_idx will point to an empty bucket
Thus, we do not need the problematic loops we had there.
Modified the checks to make sure we really have an empty or deleted
bucket before overwriting it with data.
Added some additional asserts to make sure the code behaves.
we don't want to suddenly/unexpectedly break stuff for borg users
just because platformdirs does a breaking release.
at platformdirs 2.0.0 macOS config dir changed.
at platformdirs 3.0.0 macOS config dir changed again.
at platformdirs 4.0.0 (future) - who knows?
if we run into some issue reading an input file, e.g. an I/O error,
the BackupOSError exception raised due to that will skip the current
file and no archive item will be created for this file.
But we maybe have already added some of its content chunks to the repo,
we have either written them as new chunks or incref'd some identical chunk
in the repo.
Added an exception handler that decrefs (and deletes if refcount reaches 0)
these chunks again before re-raising the exception, so the repo is in a
consistent state again and we do not have orphaned content chunks in the repo.
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.
people can recognize via the file name it is a partial file.
nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
checkpoint archives might have a single, incomplete part file as last item.
part files are always a prefix of the full file, growing in size from
checkpoint to checkpoint.
we now manage the archive items metadata stream in a special way:
- checkpoint archive A(n) might end with a partial item PI(n)
- checkpoint archive A(n+1) does not contain PI(n)
- checkpoint archive A(n+1) contains a new partial item PI(n+1)
- the final archive does not contain any partial items
not having this had created orphaned item_ptrs chunks for checkpoint archives.
also:
- borg check: show id of orphaned chunks
- borg check: archive list with explicit consider_checkpoints=True (this is the default, but better make sure).
check --archives: add --newer/--older/--newest/--oldest, fixes#7062
Options accept a timespan, like Nd for N days or Nm for N months.
Use these to do date-based matching on archives and only check some of them,
like: borg check --archives --newer=1m --newest=7d
Author: Michael Deyaso <mdeyaso@fusioniq.io>
Same change for .recreate_cmdline -> .recreate_command_line .
JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]
if they are present, process them through json_text().
this replaces s-e by "?" for the key and puts the binary
representation into key_b64, if needed.
likely this is rarely needed.
item: path, source, user, group
for non-unicode stuff borg 1.2 had "bpath".
now we have:
path - unicode approximation (invalid stuff replaced by ?)
path_b64 - base64(path_bytes) # only if needed
source has the same issue as path and is now covered also.
user and group are usually unicode or even pure ASCII,
but we rather are cautious and cover them also.
binary bytes:
- json_key = <key>_b64
- json_value == base64(value)
text (potentially with surrogate escapes):
- json_key1 = <key>
- json_value1 = value_text (s-e replaced by ?)
- json_key2 = <key>_b64
- json_value2 = base64(value_binary)
json_key2/_value2 is only present if value_text required
replacement of surrogate escapes (and thus does not represent
the original value, but just an approximation).
value_binary then gives the original bytes value (e.g. a
non-utf8 bytes sequence).
using "differenthost" (== not the current hostname) makes
the process_alive check always return True (to play safe,
because in can not check for processes on other hosts).
python's io.BufferedWriter sizes its buffer based on st_blksize.
If the write fits in this buffer, then it's possible the data from
idx.write() has not been flushed through to ,the underlying filesystem,
and getsize(fileno()) sees a too-short (or even empty) file.
Also, getsize is only documented as accepting path-like objects;
passing a fileno seems to work only because the implementation
blindly forwards everything through to os.stat without checking.
Passing unopened_tempfile avoids all three problems
- on windows, it doesn't rely on re-opening NamedTemporaryFile
(the issue which led to cc0ad321dc)
- we're following the documented API of getsize(path-like)
- the file is closed (thus flushed) inside idx.write, before getsize()
One cannot "to not x", but one can "not to x".
Avoiding split infinitives gives the added bonus that machine
translation yields better results.
setup (n/adj) vs set(v) up. We don't "I setup it" but "I set it up".
Likewise for login(n/adj) and log(v) in, backup(n/adj) and back(v) up.
\n is automatically converted on write to the platform-dependent os.linesep.
Using os.linesep instead of \n means that on Windows, the line ending becomes "\r\r\n".
Also switches mentions of {LF} to {NL} in code and docs.
On Windows, the ":" character cannot be used in a filename.
Python does not error on this because the ":" character represents data streams.
See https://stackoverflow.com/a/54508979
strange: on macOS, the globally set PKG_CONFIG_PATH was overwritten,
thus the borg build did not find openssl any more. setting it here
locally again works around the issue.
this option did not change behaviour since longer,
we only had kept it for API compatibility.
as a borg2 repo server won't have old clients talking to it,
we can safely remove this everywhere now.
Without the status being set no output was generated in
dry-run mode, confusing users about whether borg would back
up directories (in non-dry-run mode).
- == item not backed up just because of dry-run mode
x == item excluded
we want to be able to use an archive name as a directory name,
e.g. for the FUSE fs built by borg mount.
thus we can not allow "/" in an archive name on linux.
on windows, the rules are more restrictive, disallowing
quite some more characters (':<>"|*?' plus some more).
we do not have FUSE fs / borg mount on windows yet, but
we better avoid any issues.
we can not avoid ":" though, as our {now} placeholder
generates ISO-8601 timestamps, including ":" chars.
also, we do not want to have leading/trailing blanks in
archive names, neither surrogate-escapes.
control chars are disallowed also, including chr(0).
we have python str here, thus chr(0) is not expected in there
(is not used to terminate a string, like it is in C).
the UNIX time used for timestamp is seconds since 1.1.1970,
in UTC. thus, the natural way to represent it is with a
tz-aware utc datetime object.
but previously (in borg 1.x), they used naive datetime
objects and localtime.
looks like that chmod should only get done IF we are root (and on linux?).
taking away write permissions on windows/cygwin (and when running as normal
user) makes create_regular_file fail when it tries to create dir2/file3.
argparse: the default action is "store" and that overwrote an already
existing list in args.paths (e.g. from --pattern="R someroot") when it
started to process the positional PATH args.
with "extend" it now extends the existing args.paths with the list of
positional PATH arguments (which can be 0..N elements long, nargs="*").
note: "extend" is new since python 3.8, thus this can only be backported
to 1.2-maint, but not to 1.1-maint.
- file status A/M/E counters
- chunking time
- hashing time
- rx_bytes / tx_bytes
Note: the sleep() in the test is needed due to timestamp granularity on linux being much more coarse than expected (uses the system timer, 100Hz or 250Hz).
support reading new, improved hashindex header format, fixes#6960
Bit of a pain to work with that code:
- C code
- needs to still be able to read the old hashindex file format,
- while also supporting the new file format.
- the hash computed while reading the file causes additional problems because
it expects all places in the file get read exactly once and in sequential order.
I solved this by separately opening the file in the python part of the code and
checking for the magic.
BORG_IDX means the legacy file format and legacy layout of the hashtable,
BORG2IDX means the new file format and the new layout of the hashtable.
Done:
- added a version int32 directly after the magic and set it to 2 (like borg 2).
the old header had no version info, but could be denoted as version 1 in case
we ever need it (currently it decides based on the magic).
- added num_empty as indicated by a TODO in count_empty, so it does not need a
full hashtable scan to determine the amount of empty buckets.
- to keep it simpler, I just filled the HashHeader struct with a
`char reserved[1024 - 32];`
1024 being the desired overall header size and 32 being the currently used size.
this alignment might be useful in case we mmap() the hashindex file one day.
warning: src/borg/item.pyx:199:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
warning: src/borg/item.pyx:200:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
warning: src/borg/item.pyx:202:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
this turns all python level classes into extension type classes.
additionally it turns the indirect properties into direct descriptors.
test_propdict_attributes runs about 30% faster.
base memory usage as reported by sys.getsizeof(Item()):
before: 48 bytes, after this PR: 40 bytes
Author: @RonnyPfannschmidt in PR #5763
reads all chunks in on-disk order and recompresses them if they are not already using
the desired compression type and level (and obfuscation level).
supports SIGINT/ctrl-c and --checkpoint-interval (default: 1800s).
this is a borg command that compacts when committing (without this, it would have
a huge space usage). it commits/compacts every checkpoint interval or when
pressing ctrl-c / receiving SIGINT.
we should modify the meta dict given by the caller, so the caller can know
about e.g. the compression/obfuscation that was done (this is useful for rcompress).
some new stuff is not supported for NSIndex1,
but we can avoid crashing due to function signature mismatches or
missing methods and rather have more clear exceptions.
when using .scan(limit, marker), we used to use the last chunkid from
the previously returned scan result to remember how far we got and
from where we need to continue.
as this approach used the repo index to look up the respective segment/offset,
it was problematic if the code using scan was re-writing the chunk to
a new segment/offset, updating the repo index (e.g. when recompressing a chunk)
and basically destroying the memory about from where we need to continue
scanning.
thus, directly returning (segment, offset) as marker is easier and solves this issue.
otherwise, if we scan+get+put (e.g. if we read/modify/write chunks to
recompress them), it would scan past the last commit and run into the
newly written chunks (and potentially never terminate).
that would require setuptools_scm>=5.0.0 but some dists do not have that yet.
also, we do not use the version_tuple from _version.py, so it is not required anyway.
forward port of #7024.
the intention of this test is testing whether borg check
returns an error when checking a corrupted repository.
the removed assertions were rather testing the test logging
configuration, which seems flaky:
- when running all tests, assertions failed
- when running only this one test, assertions succeeded
- assertions also succeeded when running all the tests before
they were refactored to separate test modules, although the
test code was not changed, just moved.
looks like rhel7 and co is still supported and needs the old glibc.
debian stretch is not supported any more by debian, so the binaries
created on this are provided on a "use on your own risk" basis.
reverts fc67453bf3
legacy: add/remove ctype/clevel bytes prefix of compressed data
new: use a separate metadata dict
compressors: use an int as ID, not a len 1 bytestring
borg < 2:
obj = encrypted(compressed(data))
borg 2:
obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))
handle compr / decompr in repoobj
move the assert_id call from decrypt to RepoObj.parse
also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
- timezone aware timestamps
- str representation with +HHMM or +HH:MM
- get rid of to_locatime
- fix with_timestamp
- have archive start/end time always in local time with tz or as given
- idea: do not lose tz information
then we know when a backup was made and even from
which timezone it was made. if we want to compute
utc, we can do that using these infos.
this makes a quite nice archives list, with timestamps
as expected (in local time with timezone info).
at some places we just enforce utc, like for the
repo manifest timestamp or for the transaction log,
these are usually not looked at by the user.
since python 3.7, .isoformat() is usable IF timespec != "auto"
is given ("auto" [default] would be as evil as before, sometimes
formatting with, sometimes without microseconds).
also since python 3.7, there is now .fromisoformat().
There are some other places with subprocesses:
- borg create --content-from-command
- borg create --paths-from-command
- (de)compression filter process of import-tar / export-tar
implemented by introducing one level of indirection, the limit is now
very high, so it is not practically relevant any more.
we always use the indirection (storing the metadata stream chunk ids list not
directly into the archive item, but into some repo objects referenced by the new
ArchiveItem.item_ptrs list).
thus, the code behaves the same for all archive sizes.
work around setuptools puking about:
############################
# Package would be ignored #
############################
Python recognizes 'borg.cache_sync' as an importable package,
but it is not listed in the `packages` configuration of setuptools.
'borg.cache_sync' has been automatically added to the distribution only
because it may contain data files, but this behavior is likely to change
in future versions of setuptools (and therefore is considered deprecated).
Please make sure that 'borg.cache_sync' is included as a package by using
the `packages` configuration field or the proper discovery methods
(for example by using `find_namespace_packages(...)`/`find_namespace:`
instead of `find_packages(...)`/`find:`).
You can read more about "package discovery" and "data files" on setuptools
documentation page.
hopefully this is the final fix.
after first fixing of #6400 (by using os.umask after mkstemp), there
was a new problem that chmod was not supported on some fs.
even after fixing that, there were other issues, see the ACLs issue
documented in #6933.
the root cause of all this is tempfile.mkstemp internally using a
very secure, but hardcoded and for our use case problematic mode
of 0o600.
mkstemp_mode (mosty copy&paste from python stdlib tempfile module +
"black" formatting applied) supports giving the mode via the api,
that is the only change needed.
slightly dirty due to the _xxx imports from tempfile, but hopefully
this will be supported in some future python version.
Since compression type identification has been split into type and
level, the graphic needed a slight update.
Unfortunately, I don't have access to Visio, so I converted this to odg.
While writing my own out-of-band decoder, I had a hard time figuring out
how to unpack the manifest. From the description, I was only able to
read that the manifest is msgpack'd, but I had not been able to figure
out that it's also going through the same encryption+compression logic
as all other things do.
This should make it a little clearer and provide the necessary
information to understand how the compression works.
manifest, repo and cache are committed every checkpoint interval.
also, when ctrl-c is pressed, finish deleting the current archive, commit and then terminate.
the old code did just 1 attempt to detect the repo decryption key.
if the first chunkid we got from the chunks hashtable iterator was accidentally
the id of the chunk we intentionally corrupted in test_delete_double_force,
setup of the key failed and that made the test crash.
in practice, this could of course also happen if chunks are corrupted, thus
we now do many retries with other chunks before giving up.
error handling was improved: do not return None (instead of a key), it just
leads to weird crashes elsewhere, but fail early with IntegrityError and a
reasonable error msg.
rename method to make_key to avoid confusion with borg.crypto.key.identify_key.
also added some .pyi files needed to check the cython code (taken from #5703 and updated).
fixed "syntax error" in key.py.
all mypy complaints not fixed yet.
borg2's new repo format does not need computing crc32 over big amounts of
(content) data any more (we now use xxh64 for that).
thus, having a quick crc32 implementation via libdeflate is not important
enough any more to rectify having libdeflate as a requirement.
in the finished == true message, these are missing:
- message
- current / total
- info
This is to be somewhat consistent with #6683 by only providing a
minimal set of values for the finished case.
The finished messages is primarily intended for cleanup purposes,
e.g. clearing the progress display.
there was no way to tell the repository version for a remote repo.
borg 2 needs that to reject doing most operations with an old repo,
except the stuff needed for borg transfer.
These are legacy crypto modes based on AES-CTR mode:
(repokey|keyfile)[-blake2]
New crypto modes with session keys and AEAD ciphers:
(repokey|keyfile)[-blake2]-(aes-ocb|chacha20-poly1305)
Tests needed some changes:
- most used repokey/keyfile, changed to new modes
- some nonce tests removed, the new crypto code does not generate
the repo side nonces any more (were only used for AES-CTR)
v2 is the default repo version for borg 2.0.
v1 repos must only be used in a read-only way, e.g. for
--other-repo=V1_REPO with borg init and borg transfer!
This is to support general-purpose transfer of archives between related
borg2 repos.
To transfer (and convert) archives from borg 1.2 repos, users need to
give --upgrader=From12To20 .
this fixes a strange test failure that did not happen until now:
it could not read the MAGIC bytes from a (quite new) segment file,
it just returned the empty string.
maybe its appearance is related to the removed I/O calls.
This saves some segment file random IO that was previously necessary
just to determine the size of to be deleted data.
Keep old one as NSIndex1 for old borg compatibility.
Choose NSIndex or NSIndex1 based on repo index layout from HashHeader.
for an old repo index repo.get(key) returns segment, offset, None, None
if a hardlink copy of a repo was made and a new repo config
shall be saved, do NOT fill in random garbage before deleting
the previous repo config, because that would damage the hardlink
copy.
Item.xattrs is now always a StableDict mapping bytes keys -> bytes values.
The special casing of empty values (b'') getting replaced by None was removed.
see ticket and borg.helpers.msgpack docstring.
this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).
still needed compat to the past is done via want_bytes decoder in borg.item.
* make constants for files cache mode more clear
Traditionally, DEFAULT_FILES_CACHE_MODE_UI and DEFAULT_FILES_CACHE_MODE
were - as the naming scheme implies - the same setting, one being the UI
representation as given to the --files-cache command line option and the
other being the same default value in the internal representation.
It happended that the actual value used in borg create always comes from
DEFAULT_FILES_CACHE_MODE_UI (because that does have the --files-cache
option) whereas for all other commands (that do not use the files cache) it
comes from DEFAULT_FILES_CACHE_MODE.
PR #5777 then abused this fact to implement the optimisation to skip loading
of the files cache in those other commands by changing the value of
DEFAULT_FILES_CACHE_MODE to disabled.
This however also changes the meaning of that variable and thus redesignates
it to something not matching the original naming anymore.
Anyone not aware of this change and the intention behind it looking at the
code would have a hard time figuring this out and be easily mislead.
This does away with the confusion making the code more maintainable by
renaming DEFAULT_FILES_CACHE_MODE to FILES_CACHE_MODE_DISABLED, making the
new intention of that internal default clear.
* make constant for files cache mode UI default match naming scheme
This not only brings code style in line with the other helpers that do the
same thing this way, but also does away with an unnecessary absolute import
using the borg module name explicitly.
borg now has the chunks list in every item with content.
due to the symmetric way how borg now deals with hardlinks using
item.hlid, processing gets much simpler.
but some places where borg deals with other "sources" of hardlinks
still need to do some hardlink management:
borg uses the HardLinkManager there now (which is not much more
than a dict, but keeps documentation at one place and avoids some
code duplication we had before).
item.hlid is computed via hardlink_id function.
support hardlinked symlinks, fixes#2379
as we use item.hlid now to group hardlinks together,
there is no conflict with the item.source usage for
symlink targets any more.
2nd+ hardlinks now add to the files count as did the 1st one.
for borg, now all hardlinks are created equal.
so any hardlink item with chunks now adds to the "file" count.
ItemFormatter: support {hlid} instead of {source} for hardlinks
Item.hlid: same id, same hardlink (xxh64 digest)
Item.hardlink_master: not used for new archives any more
Item.source: not used for hardlink slaves any more
this is somehow similar to borg recreate,
but with different focus and way simpler:
not changing compression algo
not changing chunking
not excluding files inside an archive by path match
only dealing with complete archives
but:
different src and dst repo
only reading each chunk once
keeping the compressed payload (no decompression/recompression effort)
--dry-run can be used before and afterwards to check
it does not make sense to request versions view if you only
look at 1 archive, but the code shall not crash in that case
as it did, but give a clear error msg.
the check only considered old key -> new key changes, but
new key to new key is of course also fine.
e.g. repokey-aes-ocb -> repokey-aes-ocb (both use hmac-sha256
as id hash)
the id must now always be given correctly because
the AEAD crypto modes authenticate the chunk id.
the special case when id == MANIFEST_ID is now handled
inside assert_id, so we never need to give a None id.
it potentially will ask for the passphrase for the key of OTHERREPO.
for the newly created repo, it will use the same passphrase.
it will copy: enc_key, enc_hmac_key, id_key, chunker_seed.
keeping the id_key (and id algorithm) and the chunker seed (and chunker
algorithm and parameters) is desirable for deduplication.
the id algorithm is usually either HMAC-SHA256 or BLAKE2b.
keeping the enc_key / enc_hmac_key must be implemented carefully:
A) AES-CTR -> AES-CTR is INSECURE due to nonce reuse, thus not allowed.
B) AES-CTR -> AEAD with session keys is secure.
C) AEAD with session keys -> AEAD with session keys is secure.
AEAD modes with session keys: AES-OCB and CHACHA20-POLY1305.
all-zero chunks are propagated as:
CH_ALLOC, data=None, size=len(zeros)
other chunks are:
CH_DATA, data=data, size=len(data)
also: remove the comment with the wrong assumption
this is similar to #4777.
borg check must not crash if an archive metadata block does not decrypt.
Instead, report the archive_id, remove the archive from the manifest and skip to the next archive.
selftest imports testsuite.crypto
I did not realise this and imported pytest from testsuite.crypto
This broke the selftest.
Solution: move the tests that depend on pytest to testsuite.key.
All three affected tests are tests for the Key classes, so
this is probably a better plase for them anyway.
when migrating from repokey to keyfile, we just store an empty key into the repo config,
because we do not have a "delete key" RPC api. thus, empty key means "there is no key".
here we fix load_key, so that it does not behave differently for no key and empty key:
in both cases, it just returns an empty value.
additionally, we strip the value we get from the config, so whitespace does not matter.
All callers now check for the repokey not being empty, otherwise RepoKeyNotFoundError
is raised.
for now, this code shall only work on v2 repos (created by this code).
the code to read v1 repos is still present though, so for experiments,
it is possible to change the repo version in the repo config from 1 to
2 manually.
having version 2 in the repo config also avoids that borg < 1.3 is
used on such a repo, which would cause damage:
old borg would not recognize the PUT2 tagged segment entries and
old borg check --repair would likely kill them all due to that.
also: keep repo version in Repository.version
note: this required a slight increase of MAX_OBJECT_SIZE so that MAX_DATA_SIZE
could stay the same as before.
For PUT2, compute the hash over the whole entry (header and content, excluding
hash and crc32 fields, because the crc32 computation includes the hash).
Also: refactor crc32 checks into function, use f-strings, structure _read in
a more logical sequential order.
write_put: avoid creating a large temporary bytes object
why use xxh64?
- fast even without hw acceleration
- borg depends on it already anyway
- stronger than crc32 and strong enough for this purpose
Argon2 the second part: implement encryption/decryption of argon2 keys
borg init --key-algorithm=argon2 (new default, older pbkdf2 also still available)
borg key change-passphrase: keep key algorithm the same
borg key change-location: keep key algorithm the same
use env var BORG_TESTONLY_WEAKEN_KDF=1 to resource limit (cpu, memory, ...) the kdf when running the automated tests.
OpenBSD does not have `lchmod()` causing `os.lchmod` to be unavailable
on this platform. As a result ArchiverTestCase::test_basic_functionality
fails when run manually (#2055).
OpenBSD does have `fchmodat()`, which has a flag that makes it behave
like `lchmod()`. In Python this can be used via `os.chmod(path, mode,
follow_symlinks=False)`.
As of Python 3.3 `os.lchmod(path, mode)` is equivalent to
`os.chmod(path, mode, follow_symlinks=False)`. As such, switching to the
latter is preferred as it enables more platforms to do the right thing.
although bug #6526 did not show with ssh style URLs, we should
not have different regexes for the host part for ssh and scp style.
thus i extracted the host_re from both and also cleaned up a bit.
added a negative lookahead/lookbehind to make sure an ipv6 addr
(enclosed in square brackets) does not get badly matched by the
regex part intended for hostnames and ipv4 addrs only.
the other part of that regex which is actually intended to match
ipv6 addrs only matches if they are enclosed in square brackets.
also added tests for ssh and scp style repo URLs with ipv6 addrs
in brackets.
also: made regex more readable, putting these 2 cases on separate lines.
The previous sample for creating a ~/.borg-passphrase file creates it first and then chmod's it to 400 permissions. That's probably fine in practice, but means there's a tiny window where the passphrase file is sitting with default permissions (likely world readable, depending on the system umask).
It seems safer to first change the umask to remove all group & world bits (0077) _before_ creating the file. To be polite and avoid messing with the user's previous umask, we do this in a subshell. (Note that umask 0077 leads to a mode of 600 rather than the previous 400, because removing the owner write bit doesn't seem to buy much since the owner can just chmod the file anyway.)
export-tar: just msgpack and b64encode all item metadata and
put that into a BORG specific PAX header.
this is *additional* to the standard tar metadata.
import-tar: when detecting the BORG specific PAX header, just get
all metadata from there (and ignore the standard tar
metadata).
--tar-format=GNU|PAX (default: GNU)
changed the tests which use GNU tar cli tool to use --tar-format=GNU
explicitly, so they don't break in case we change the default.
atime timestamp is only present in output if the archive item has it
(which is not the case by default, needs "borg create --atime ...").
if LZ4/ZSTD.decompress gets called with a memoryview idata, keep
it until after the super().decompress(idata) call, so we save one
copy operation just to remove the 2 bytes long compression type
header.
attic is borg's parent project, but it stalled in 2015 and was not updated since then.
guess we can assume that most attic users have meanwhile noticed this and already
converted their repos to borg.
if some did not yet, they are advised to use borg < 1.3 to do that ASAP.
note: borg can still DETECT an attic repo by recognizing its ATTIC_MAGIC value
and then gives exactly that advice.
Code gets simpler if we always only use the (shorter) header_fmt.
That format ALWAYS applies, to all tags borg writes.
If the tag unpacked from there indicates that there is also a chunkid
to read (like for PUT and DEL), we can decide that inside _read and
then read the chunkid from the fd.
olen is assigned by OpenSSL, but the compiler can't know that and generates these warnings:
warning: src/borg/crypto/low_level.pyx:271:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:274:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:314:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:317:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:514:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:517:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:566:22: local variable 'olen' referenced before assignment
warning: src/borg/crypto/low_level.pyx:572:22: local variable 'olen' referenced before assignment
added it for all classes there, so the caller just give it.
for the legacy AES-CTR based classes, the given aad is completely ignored.
this is to stay compatible with repo data of borg < 1.3.
for the new AEAD based classes:
encrypt: the aad is fed into the auth tag computation
decrypt: same. decrypt will fail on auth tag mismatch.
we already have .decrypt(id, data, ...).
i changed .encrypt(chunk) to .encrypt(id, data).
the old borg crypto won't really need or use the id,
but the new AEAD crypto will authenticate the id in future.
if we just have a pointer to a bytes object which might go out of scope, we can lose it.
also: cython can directly assign a bytes object into a same-size char array.
if we just have a pointer to a bytes object which might go out of scope, we can lose it.
also: cython can directly assign a bytes object into a same-size char array.
encrypt used to "patch" the IV into the header,
decrypt used to fetch it from there.
encrypt now takes the header just "as is" and
also decrypt expects that the IV is already set.
also:
cleanup class structure: less inheritance, more mixins.
define type bytes using the 4:4 split
upper 4 bits are ciphersuite:
0 == legacy AES-CTR based stuff
1+ == new AEAD stuff
lower 4 bits are keytype:
legacy: a bit mixed up, as it was...
new stuff: 0=keyfile 1=repokey, ...
`borg benchmark cpu` fails on OpenBSD with the error below, which is
caused by LibreSSL currently not supporting AES256_OCB and
CHACHA20_POLY1305.
Work around this by checking if borg is used with LibreSSL. Tested on
OpenBSD.
```
Chunkers =======================================================
buzhash,19,23,21,4095 1GB 14.294s
fixed,1048576 1GB 0.244s
Non-cryptographic checksums / hashes ===========================
crc32 (libdeflate, used) 1GB 0.724s
crc32 (zlib) 1GB 1.953s
xxh64 1GB 0.361s
Cryptographic hashes / MACs ====================================
hmac-sha256 1GB 7.039s
blake2b-256 1GB 9.845s
Encryption =====================================================
aes-256-ctr-hmac-sha256 1GB 18.312s
aes-256-ctr-blake2b 1GB 21.213s
Local Exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 5241, in main
exit_code = archiver.run(args)
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 5172, in run
return set_ec(func(args))
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 607, in do_benchmark_cpu
print(f"{spec:<24} {size:<10} {timeit(func, number=100):.3f}s")
File "/usr/local/lib/python3.9/timeit.py", line 233, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/local/lib/python3.9/timeit.py", line 177, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/usr/local/lib/python3.9/site-packages/borg/archiver.py", line 602, in <lambda>
("aes-256-ocb", lambda: AES256_OCB(
File "src/borg/crypto/low_level.pyx", line 636, in borg.crypto.low_level.AES256_OCB.__init__
File "src/borg/crypto/low_level.pyx", line 633, in borg.crypto.low_level.AES256_OCB.requirements_check
ValueError: AES OCB is not implemented by LibreSSL (yet?).
Platform: OpenBSD gateway.lan 7.1 GENERIC.MP#418 amd64
Borg: 1.2.1.dev98+gebaf0c32 Python: CPython 3.9.10 msgpack: 1.0.3 fuse: None [pyfuse3,llfuse]
PID: 38614 CWD: /storage/8899fc1454db04de.a/home/code/git/ports/sysutils/borg
sys.argv: ['/usr/local/bin/borg', 'benchmark', 'cpu']
SSH_ORIGINAL_COMMAND: None
```
we tried to be very private / secure here, but that created the issue
that a less secure umask (like e.g. 0o007) just did not work.
to make the umask work, we must start from 0o777 mode and let the
umask do its work, like e.g. 0o777 & ~0o007 --> 0o770.
with borg's default umask of 0o077, it usually ends up being 0o700,
so only permissions for the user (not group, not others).
"passphrase" encryption mode repos can not be created since borg 1.0.
back then, users were advised to switch existing repos of that type
to repokey mode using the "borg key migrate-to-repokey" command.
that command is still available in borg 1.0, 1.1 and 1.2, but not
any more in borg >= 1.3.
while we still might see the PassphraseKey.TYPE byte in old repos,
it is handled by the RepoKey code since borg 1.0.
in the finally-block, we wait for the filter process to die. but it only dies
voluntarily if all data was processed by the filter and it terminates due to EOF.
otoh, if borg has thrown an early exception, e.g. "archive already exists",
we need to kill the filter process to bring it to an early end. in that
case, we also do not need to check the filter rc, because we know we killed it.
looks like with a .tar file created by the tar tool,
tarinfo.mtime is a float [s]. So, after converting to
nanoseconds, we need to cast to int because that's what
Item.mtime wants.
also added a safe_ns() there to clip values to the safe range.
this was used to compare compatibility of our vendored
blake2b code (which we do not have any more) against the
python stdlib blake2b code (which we always use now anyway).
These instances of implicit switch case fallthrough appear to be
intentional. Add comments that the compiler understands to suppress
the false positive warning.
#6338 introduces regression when building with LibreSSL (3.5.0).
```
cc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -O2 -pipe -g -fPIC -O2 -pipe -g -O2 -pipe -g -O2 -pipe -fPIC -Isrc/borg/crypto -I/usr/local/include/python3.9 -c src/borg/crypto/low_level.c -o /tmp/ports/pobj/borgbackup-1.2.1/borg-eec359cf228caf00d9c72bde07bf939872e9d3fa/temp.openbsd-7.1-amd64-3.9/src/borg/crypto/low_level.o
src/borg/crypto/low_level.c:12439:48: error: use of undeclared identifier 'EVP_chacha20_poly1305'; did you mean 'EVP_aead_chacha20_poly1305'?
__pyx_v_self->__pyx_base.__pyx_base.cipher = EVP_chacha20_poly1305;
^~~~~~~~~~~~~~~~~~~~~
EVP_aead_chacha20_poly1305
/usr/include/openssl/evp.h:1161:17: note: 'EVP_aead_chacha20_poly1305' declared here
const EVP_AEAD *EVP_aead_chacha20_poly1305(void);
^
1 error generated.
```
Unfortunately `EVP_aead_chacha20_poly1305`, offered by LibreSSL, is not
a drop in replacement for `EVP_chacha20_poly1305`. More info on the
first can be found at https://man.openbsd.org/EVP_AEAD_CTX_init.3.
The key argument being sent to hashindex_get and hashindex_set by
multiple functions is a different signedness from what the functions
expect. This resolves the issue by changing the key type in the
unpack_user struct to unsigned char.
The value argument of hashindex_set is causing harmless pointer type
mismatches. This resolves the issue by changing the type to void*
which silences these types of warnings.
This resolves a compiler warning from the generated code that
resulted from a comparison of two local variables of different
signedness. The issue is resolved by changing the type of both
to int since this seems like the safest choice available.
All these are unsupported since long.
Newer versions of LibreSSL have gained chacha20-poly1305 support,
but still lack aes256-ocb support.
Also they have the HMAC_CTX_new/free api now.
docs: openssl >= 1.1.1 is required now
anything older is out of support anyway.
The generated source code was producing a compiler warning due to
the pointers differing in constness. The called function expects
a non-const pointer while the generated code produces a const pointer
via a cast. This changes the cast to drop 'const' to make the compiler
happy.
docs: clarify on-disk order and size of log entry fields
The order of the fields of a log entry on disk is CRC32 first, the docs had the size first.
I tried to make this list similar to the HashIndex struct description.
ln-s/usr/pkg/lib/python3.8/_sysconfigdata_netbsd9.py/usr/pkg/lib/python3.8/_sysconfigdata__netbsd9_.py# bug in netbsd 9.2, expected filename not there.
ln-s/usr/pkg/lib/python3.9/_sysconfigdata_netbsd9.py/usr/pkg/lib/python3.9/_sysconfigdata__netbsd9_.py# bug in netbsd 9.2, expected filename not there.
Reboot a few times to ensure that the hardware path does not change: on some motherboards
components of it can be random. In these cases you cannot use a more accurate rule,
or need to insert additional stars for matching the path.
ACTION=="add", SUBSYSTEM=="block", ENV{ID_PART_TABLE_UUID}=="<the PTUUID you just noted>", TAG+="systemd", ENV{SYSTEMD_WANTS}+="automatic-backup.service"
The "systemd" tag in conjunction with the SYSTEMD_WANTS environment variable has systemd
launch the "automatic-backup" service, which we will create next, as the
@ -60,8 +47,8 @@ launch the "automatic-backup" service, which we will create next, as the
Type=oneshot
ExecStart=/etc/backups/run.sh
Now, create the main backup script, ``/etc/backups/run.sh``. Below is a template,
modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
Now, create the main backup script, ``/etc/backups/run.sh``. Below is a template;
modify it to suit your needs (e.g., more backup sets, dumping databases, etc.).
..code-block:: bash
@ -107,10 +94,10 @@ modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
echo "Disk $uuid is a backup disk"
partition_path=/dev/disk/by-uuid/$uuid
# Mount filesystem if not already done. This assumes that if something is already
# mounted at $MOUNTPOINT, it is the backup drive. It won't find the drive if
# Mount filesystem if not already done. This assumes that if something is already
# mounted at $MOUNTPOINT, it is the backup drive. It will not find the drive if
# it was mounted somewhere else.
(mount | grep $MOUNTPOINT) || mount $partition_path $MOUNTPOINT
findmnt $MOUNTPOINT >/dev/null || mount $partition_path $MOUNTPOINT
drive=$(lsblk --inverse --noheadings --list --paths --output name $partition_path | head --lines 1)
echo "Drive path: $drive"
@ -119,13 +106,13 @@ modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
for the same reason. Therefore, partial checks may be useful only with very large
repositories where a full check would take too long.
.sp
The \fB\-\-verify\-data\fP option will perform a full integrity verification (as
opposed to checking just the xxh64) of data, which means reading the
data from the repository, decrypting and decompressing it. It is a complete
cryptographic verification and hence very time\-consuming, but will detect any
accidental and malicious corruption. Tamper\-resistance is only guaranteed for
encrypted repositories against attackers without access to the keys. You cannot
use \fB\-\-verify\-data\fP with \fB\-\-repository\-only\fP\&.
.sp
The \fB\-\-find\-lost\-archives\fP option will also scan the whole repository, but
tells Borg to search for lost archive metadata. If Borg encounters any archive
metadata that does not match an archive directory entry (including
soft\-deleted archives), it means that an entry was lost.
Unless \fBborg compact\fP is called, these archives can be fully restored with
\fB\-\-repair\fP\&. Please note that \fB\-\-find\-lost\-archives\fP must read a lot of
data from the repository and is thus very time\-consuming. You cannot use
\fB\-\-find\-lost\-archives\fP with \fB\-\-repository\-only\fP\&.
.SSAboutrepairmode
.sp
The check command is a read\-only task by default. If any corruption is found,
Borg will report the issue and proceed with checking. To actually repair the
issues found, pass \fB\-\-repair\fP\&.
.sp
\fBNOTE:\fP
.INDENT0.0
.INDENT3.5
\fB\-\-repair\fP is a \fBPOTENTIALLY DANGEROUS FEATURE\fP and might lead to data
loss! This does not just include data that was previously lost anyway, but
might include more data for kinds of corruption it is not capable of
dealing with. \fBBE VERY CAREFUL!\fP
.UNINDENT
.UNINDENT
.sp
Pursuant to the previous warning it is also highly recommended to test the
reliability of the hardware running this software with stress testing software
such as memory testers. Unreliable hardware can also lead to data loss especially
when this command is run in repair mode.
reliability of the hardware running Borg with stress testing software. This
especially includes storage and memory testers. Unreliable hardware might lead
to additional data loss.
.sp
First, the underlying repository data files are checked:
It is highly recommended to create a backup of your repository before running
in repair mode (i.e. running it with \fB\-\-repair\fP).
.sp
Repair mode will attempt to fix any corruptions found. Fixing corruptions does
not mean recovering lost data: Borg cannot magically restore data lost due to
e.g. a hardware failure. Repairing a repository means sacrificing some data
for the sake of the repository as a whole and the remaining data. Hence it is,
by definition, a potentially lossy task.
.sp
In practice, repair mode hooks into both the repository and archive checks:
.INDENT0.0
.IP\(bu2
For all segments, the segment magic header is checked.
.IP\(bu2
For all objects stored in the segments, all metadata (e.g. CRC and size) and
all data is read. The read data is checked by size and CRC. Bit rot and other
types of accidental damage can be detected this way.
.IP\(bu2
In repair mode, if an integrity error is detected in a segment, try to recover
as many objects from the segment as possible.
.IP\(bu2
In repair mode, make sure that the index is consistent with the data stored in
the segments.
.IP\(bu2
If checking a remote repo via \fBssh:\fP, the repo check is executed on the server
without causing significant network traffic.
.IP\(bu2
The repository check can be skipped using the \fB\-\-archives\-only\fP option.
.IP\(bu2
A repository check can be time consuming. Partial checks are possible with the
\fB\-\-max\-duration\fP option.
.IP1.3
When checking the repository\(aqs consistency, repair mode removes corrupted
objects from the repository after it did a 2nd try to read them correctly.
.IP2.3
When checking the consistency and correctness of archives, repair mode might
remove whole archives from the manifest if their archive metadata chunk is
corrupt or lost. Borg will also report files that reference missing chunks.
.UNINDENT
.sp
Second, the consistency and correctness of the archive metadata is verified:
.INDENT0.0
.IP\(bu2
Is the repo manifest present? If not, it is rebuilt from archive metadata
chunks (this requires reading and decrypting of all metadata and data).
.IP\(bu2
Check if archive metadata chunk is present; if not, remove archive from manifest.
.IP\(bu2
For all files (items) in the archive, for all chunks referenced by these
files, check if chunk is present. In repair mode, if a chunk is not present,
replace it with a same\-size replacement chunk of zeroes. If a previously lost
chunk reappears (e.g. via a later backup), in repair mode the all\-zero replacement
chunk will be replaced by the correct chunk. This requires reading of archive and
file metadata, but not data.
.IP\(bu2
In repair mode, when all the archives were checked, orphaned chunks are deleted
from the repo. One cause of orphaned chunks are input file related errors (like
read errors) in the archive creation process.
.IP\(bu2
In verify\-data mode, a complete cryptographic verification of the archive data
integrity is performed. This conflicts with \fB\-\-repository\-only\fP as this mode
only makes sense if the archive checks are enabled. The full details of this mode
are documented below.
.IP\(bu2
If checking a remote repo via \fBssh:\fP, the archive check is executed on the
client machine because it requires decryption, and this is always done client\-side
as key access is needed.
.IP\(bu2
The archive checks can be time consuming; they can be skipped using the
\fB\-\-repository\-only\fP option.
.UNINDENT
.sp
The \fB\-\-max\-duration\fP option can be used to split a long\-running repository check
into multiple partial checks. After the given number of seconds the check is
interrupted. The next partial check will continue where the previous one stopped,
until the complete repository has been checked. Example: Assuming a full check took 7
hours, then running a daily check with \-\-max\-duration=3600 (1 hour) resulted in one
full check per week.
.sp
Attention: Partial checks can only do way less checking than a full check (only the
CRC32 checks on segment file entries are done), and cannot be combined with the
\fB\-\-repair\fP option. Partial checks may therefore be useful only with very large
repositories where a full check took too long. Doing a full repository check aborts a
partial check; the next partial check will restart from the beginning.
.sp
The \fB\-\-verify\-data\fP option will perform a full integrity verification (as opposed to
checking the CRC32 of the segment) of data, which means reading the data from the
repository, decrypting and decompressing it. This is a cryptographic verification,
which will detect (accidental) corruption. For encrypted repositories it is
tamper\-resistant as well, unless the attacker has access to the keys. It is also very
slow.
If \fB\-\-repair \-\-find\-lost\-archives\fP is given, previously lost entries will
be recreated in the archive directory. This is only possible before
\fBborg compact\fP would remove the archives\(aq data completely.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to check consistency of
.UNINDENT
.SSoptionalarguments
.SSoptions
.INDENT0.0
.TP
.B\-\-repository\-only
only perform repository checks
.TP
.B\-\-archives\-only
only perform archives checks
only perform archive checks
.TP
.B\-\-verify\-data
perform cryptographic archive data integrity verification (conflicts with \fB\-\-repository\-only\fP)
@ -144,29 +161,38 @@ perform cryptographic archive data integrity verification (conflicts with \fB\-\
.B\-\-repair
attempt to repair any inconsistencies found
.TP
.B\-\-save\-space
work slower, but using less space
.B\-\-find\-lost\-archives
attempt to find lost archives
.TP
.BI\-\-max\-duration\ SECONDS
do only a partial repo check for max. SECONDS seconds (Default: unlimited)
perform only a partial repository check for at most SECONDS seconds (default: unlimited)
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
read include/exclude patterns from PATTERNFILE, one per line
.TP
.B\-\-exclude\-caches
exclude directories that contain a CACHEDIR.TAG file (\fI\%http://www.bford.info/cachedir/spec.html\fP)
exclude directories that contain a CACHEDIR.TAG file ( <http://www.bford.info/cachedir/spec.html> )
.TP
.BI\-\-exclude\-if\-present\ NAME
exclude directories that are tagged by containing a filesystem object with the given NAME
.TP
.B\-\-keep\-exclude\-tags
if tag objects are specified with \fB\-\-exclude\-if\-present\fP, don\(aqt omit the tag objects themselves from the backup archive
.TP
.B\-\-exclude\-nodump
exclude files flagged NODUMP
if tag objects are specified with \fB\-\-exclude\-if\-present\fP, do not omit the tag objects themselves from the backup archive
.UNINDENT
.SSFilesystemoptions
.INDENT0.0
.TP
.B\-x\fP,\fB\-\-one\-file\-system
stay in the same file system and do not store mount points of other file systems. This might behave different from your expectations, see the docs.
.TP
.B\-\-numeric\-owner
deprecated, use \fB\-\-numeric\-ids\fP instead
stay in the same file system and do not store mount points of other file systems \- this might behave different from your expectations, see the description below.
.TP
.B\-\-numeric\-ids
only store numeric user and group identifiers
.TP
.B\-\-noatime
do not store atime into archive
.TP
.B\-\-atime
do store atime into archive
.TP
@ -228,9 +235,6 @@ do not store ctime into archive
.B\-\-nobirthtime
do not store birthtime (creation date) into archive
.TP
.B\-\-nobsdflags
deprecated, use \fB\-\-noflags\fP instead
.TP
.B\-\-noflags
do not read and store flags (e.g. NODUMP, IMMUTABLE) into archive
.TP
@ -246,6 +250,9 @@ detect sparse holes in input (supported only by fixed chunker)
.BI\-\-files\-cache\ MODE
operate files cache in MODE. default: ctime,size,inode
.TP
.BI\-\-files\-changed\ MODE
specify how to detect if a file has changed during backup (ctime, mtime, disabled). default: ctime
.TP
.B\-\-read\-special
open and read block and char device files as well as FIFOs as if they were regular files. Also follows symlinks pointing to these kinds of files.
.UNINDENT
@ -256,106 +263,113 @@ open and read block and char device files as well as FIFOs as if they were regul
add a comment text to the archive
.TP
.BI\-\-timestamp\ TIMESTAMP
manually specify the archive creation date/time (UTC, yyyy\-mm\-ddThh:mm:ss format). Alternatively, give a reference file/directory.
write checkpoint every SECONDS seconds (Default: 1800)
manually specify the archive creation date/time (yyyy\-mm\-ddThh:mm:ss[(+|\-)HH:MM] format, (+|\-)HH:MM is the UTC offset, default: local time zone). Alternatively, give a reference file/directory.
borg-delete \- Delete an existing repository or archives
borg-delete \- Deletes archives.
.SHSYNOPSIS
.sp
borg [common options] delete [options] [REPOSITORY_OR_ARCHIVE] [ARCHIVE...]
borg [common options] delete [options] [NAME]
.SHDESCRIPTION
.sp
This command deletes an archive from the repository or the complete repository.
This command soft\-deletes archives from the repository.
.sp
Important: When deleting archives, repository disk space is \fBnot\fP freed until
Important:
.INDENT0.0
.IP\(bu2
The delete command will only mark archives for deletion (\(dqsoft\-deletion\(dq),
repository disk space is \fBnot\fP freed until you run \fBborg compact\fP\&.
.IP\(bu2
You can use \fBborg undelete\fP to undelete archives, but only until
you run \fBborg compact\fP\&.
.sp
When you delete a complete repository, the security info and local cache for it
(if any) are also deleted. Alternatively, you can delete just the local cache
with the \fB\-\-cache\-only\fP option, or keep the security info with the
\fB\-\-keep\-security\-info\fP option.
.UNINDENT
.sp
When in doubt, use \fB\-\-dry\-run \-\-list\fP to see what would be deleted.
.sp
When using \fB\-\-stats\fP, you will get some statistics about how much data was
deleted \- the "Deleted data" deduplicated size there is most interesting as
that is how much your repository will shrink.
Please note that the "All archives" stats refer to the state after deletion.
.sp
You can delete multiple archives by specifying their common prefix, if they
have one, using the \fB\-\-prefix PREFIX\fP option. You can also specify a shell
pattern to match multiple archives using the \fB\-\-glob\-archives GLOB\fP option
(for more info on these patterns, see \fIborg_patterns\fP). Note that these
two options are mutually exclusive.
.sp
To avoid accidentally deleting archives, especially when using glob patterns,
it might be helpful to use the \fB\-\-dry\-run\fP to test out the command without
actually making any changes to the repository.
You can delete multiple archives by specifying a match pattern using
the \fB\-\-match\-archives PATTERN\fP option (for more information on these
patterns, see \fIborg_patterns\fP).
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to delete
.TP
.BARCHIVE
archives to delete
.BNAME
specify the archive name
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-n\fP,\fB\-\-dry\-run
do not change repository
do not change the repository
.TP
.B\-\-list
output verbose list of archives
.TP
.B\-s\fP,\fB\-\-stats
print statistics for the deleted archive
.TP
.B\-\-cache\-only
delete only the local cache for the given repository
.TP
.B\-\-force
force deletion of corrupted archives, use \fB\-\-force \-\-force\fP in case \fB\-\-force\fP does not work.
.TP
.B\-\-keep\-security\-info
keep the local security info when deleting a repository
.TP
.B\-\-save\-space
work slower, but using less space
output a verbose list of archives
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
write checkpoint every SECONDS seconds (Default: 1800)
manually specify the archive creation date/time (yyyy\-mm\-ddThh:mm:ss[(+|\-)HH:MM] format, (+|\-)HH:MM is the UTC offset, default: local time zone). Alternatively, give a reference file/directory.
borg-info \- Show archive details such as disk space used
.SHSYNOPSIS
.sp
borg [common options] info [options] [REPOSITORY_OR_ARCHIVE]
borg [common options] info [options] [NAME]
.SHDESCRIPTION
.sp
This command displays detailed information about the specified archive or repository.
This command displays detailed information about the specified archive.
.sp
Please note that the deduplicated sizes of the individual archives do not add
up to the deduplicated size of the repository ("all archives"), because the two
are meaning different things:
up to the deduplicated size of the repository (\(dqall archives\(dq), because the two
mean different things:
.sp
This archive / deduplicated size = amount of data stored ONLY for this archive
= unique chunks of this archive.
All archives / deduplicated size = amount of data stored in the repo
All archives / deduplicated size = amount of data stored in the repository
= all chunks in the repository.
.sp
Borg archives can only contain a limited amount of file metadata.
The size of an archive relative to this limit depends on a number of factors,
mainly the number of files, the lengths of paths and other metadata stored for files.
This is shown as \fIutilization of maximum supported archive size\fP\&.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to display information about
.BNAME
specify the archive name
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-\-json
@ -68,86 +63,55 @@ format output as JSON
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
This command initializes an empty repository. A repository is a filesystem
directory containing the deduplicated data from zero or more archives.
.SSEncryptionmodeTLDR
.sp
The encryption mode can only be configured when creating a new repository \-
you can neither configure it on a per\-archive basis nor change the
encryption mode of an existing repository.
.sp
Use \fBrepokey\fP:
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
borg init \-\-encryption repokey /path/to/repo
.ftP
.fi
.UNINDENT
.UNINDENT
.sp
Or \fBrepokey\-blake2\fP depending on which is faster on your client machines (see below):
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
borg init \-\-encryption repokey\-blake2 /path/to/repo
.ftP
.fi
.UNINDENT
.UNINDENT
.sp
Borg will:
.INDENT0.0
.IP1.3
Ask you to come up with a passphrase.
.IP2.3
Create a borg key (which contains 3 random secrets. See \fIkey_files\fP).
.IP3.3
Encrypt the key with your passphrase.
.IP4.3
Store the encrypted borg key inside the repository directory (in the repo config).
This is why it is essential to use a secure passphrase.
.IP5.3
Encrypt and sign your backups to prevent anyone from reading or forging them unless they
have the key and know the passphrase. Make sure to keep a backup of
your key \fBoutside\fP the repository \- do not lock yourself out by
"leaving your keys inside your car" (see \fIborg_key_export\fP).
For remote backups the encryption is done locally \- the remote machine
never sees your passphrase, your unencrypted key or your unencrypted files.
Chunking and id generation are also based on your key to improve
your privacy.
.IP6.3
Use the key when extracting files to decrypt them and to verify that the contents of
the backups have not been accidentally or maliciously altered.
.UNINDENT
.SSPickingapassphrase
.sp
Make sure you use a good passphrase. Not too short, not too simple. The real
encryption / decryption key is encrypted with / locked by your passphrase.
If an attacker gets your key, he can\(aqt unlock and use it without knowing the
passphrase.
.sp
Be careful with special or non\-ascii characters in your passphrase:
.INDENT0.0
.IP\(bu2
Borg processes the passphrase as unicode (and encodes it as utf\-8),
so it does not have problems dealing with even the strangest characters.
.IP\(bu2
BUT: that does not necessarily apply to your OS / VM / keyboard configuration.
.UNINDENT
.sp
So better use a long passphrase made from simple ascii chars than one that
includes non\-ascii stuff or characters that are hard/impossible to enter on
a different keyboard layout.
.sp
You can change your passphrase for existing repos at any time, it won\(aqt affect
the encryption/decryption key or other secrets.
.SSMoreencryptionmodes
.sp
Only use \fB\-\-encryption none\fP if you are OK with anyone who has access to
your repository being able to read your backups and tamper with their
contents without you noticing.
.sp
If you want "passphrase and having\-the\-key" security, use \fB\-\-encryption keyfile\fP\&.
The key will be stored in your home directory (in \fB~/.config/borg/keys\fP).
.sp
If you do \fBnot\fP want to encrypt the contents of your backups, but still
want to detect malicious tampering use \fB\-\-encryption authenticated\fP\&.
.sp
If \fBBLAKE2b\fP is faster than \fBSHA\-256\fP on your hardware, use \fB\-\-encryption authenticated\-blake2\fP,
\fB\-\-encryption repokey\-blake2\fP or \fB\-\-encryption keyfile\-blake2\fP\&. Note: for remote backups
the hashing is done on your local machine.
.\" nanorst: inline-fill
.
.TS
center;
|l|l|l|l|.
_
T{
Hash/MAC
T} T{
Not encrypted
no auth
T} T{
Not encrypted,
but authenticated
T} T{
Encrypted (AEAD w/ AES)
and authenticated
T}
_
T{
SHA\-256
T} T{
none
T} T{
\fIauthenticated\fP
T} T{
repokey
keyfile
T}
_
T{
BLAKE2b
T} T{
n/a
T} T{
\fIauthenticated\-blake2\fP
T} T{
\fIrepokey\-blake2\fP
\fIkeyfile\-blake2\fP
T}
_
.TE
.\" nanorst: inline-replace
.
.sp
Modes \fImarked like this\fP in the above table are new in Borg 1.1 and are not
backwards\-compatible with Borg 1.0.x.
.sp
On modern Intel/AMD CPUs (except very cheap ones), AES is usually
hardware\-accelerated.
BLAKE2b is faster than SHA256 on Intel/AMD 64\-bit CPUs
(except AMD Ryzen and future CPUs with SHA extensions),
which makes \fIauthenticated\-blake2\fP faster than \fInone\fP and \fIauthenticated\fP\&.
.sp
On modern ARM CPUs, NEON provides hardware acceleration for SHA256 making it faster
than BLAKE2b\-256 there. NEON accelerates AES as well.
.sp
Hardware acceleration is always used automatically when available.
.sp
\fIrepokey\fP and \fIkeyfile\fP use AES\-CTR\-256 for encryption and HMAC\-SHA256 for
authentication in an encrypt\-then\-MAC (EtM) construction. The chunk ID hash
is HMAC\-SHA256 as well (with a separate key).
These modes are compatible with Borg 1.0.x.
.sp
\fIrepokey\-blake2\fP and \fIkeyfile\-blake2\fP are also authenticated encryption modes,
but use BLAKE2b\-256 instead of HMAC\-SHA256 for authentication. The chunk ID
hash is a keyed BLAKE2b\-256 hash.
These modes are new and \fInot\fP compatible with Borg 1.0.x.
.sp
\fIauthenticated\fP mode uses no encryption, but authenticates repository contents
through the same HMAC\-SHA256 hash as the \fIrepokey\fP and \fIkeyfile\fP modes (it uses it
as the chunk ID hash). The key is stored like \fIrepokey\fP\&.
This mode is new and \fInot\fP compatible with Borg 1.0.x.
.sp
\fIauthenticated\-blake2\fP is like \fIauthenticated\fP, but uses the keyed BLAKE2b\-256 hash
from the other blake2 modes.
This mode is new and \fInot\fP compatible with Borg 1.0.x.
.sp
\fInone\fP mode uses no encryption and no authentication. It uses SHA256 as chunk
ID hash. This mode is not recommended, you should rather consider using an authenticated
or authenticated/encrypted mode. This mode has possible denial\-of\-service issues
when running \fBborg create\fP on contents controlled by an attacker.
Use it only for new repositories where no encryption is wanted \fBand\fP when compatibility
with 1.0.x is important. If compatibility with 1.0.x is not important, use
\fIauthenticated\-blake2\fP or \fIauthenticated\fP instead.
This mode is compatible with Borg 1.0.x.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY
repository to create
.UNINDENT
.SSoptionalarguments
.INDENT0.0
.TP
.BI\-e\ MODE\fR,\fB\ \-\-encryption\ MODE
select encryption key mode \fB(required)\fP
.TP
.B\-\-append\-only
create an append\-only mode repository. Note that this only affects the low level structure of the repository, and running \fIdelete\fP or \fIprune\fP will still be allowed. See \fIappend_only_mode\fP in Additional Notes for more details.
.TP
.BI\-\-storage\-quota\ QUOTA
Set storage quota of the new repository (e.g. 5G, 1.5T). Default: no quota.
.TP
.B\-\-make\-parent\-dirs
create the parent directories of the repository directory, if they are missing.
.UNINDENT
.SHEXAMPLES
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
# Local repository, repokey encryption, BLAKE2b (often faster, since Borg 1.1)
$ borg init \-\-encryption=repokey\-blake2 /path/to/repo
# Local repository (no encryption)
$ borg init \-\-encryption=none /path/to/repo
# Remote repository (accesses a remote borg via ssh)
# repokey: stores the (encrypted) key into <REPO_DIR>/config
$ borg init \-\-encryption=repokey\-blake2 user@hostname:backup
# Remote repository (accesses a remote borg via ssh)
# keyfile: stores the (encrypted) key into ~/.config/borg/keys/
$ borg init \-\-encryption=keyfile user@hostname:backup
borg [common options] list [options] [REPOSITORY_OR_ARCHIVE] [PATH...]
borg [common options] list [options] NAME [PATH...]
.SHDESCRIPTION
.sp
This command lists the contents of a repository or an archive.
This command lists the contents of an archive.
.sp
For more help on include/exclude patterns, see the \fIborg_patterns\fP command output.
For more help on include/exclude patterns, see the output of \fIborg_patterns\fP\&.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to list contents of
.BNAME
specify the archive name
.TP
.BPATH
paths to list; patterns are supported
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-\-consider\-checkpoints
Show checkpoint archives in the repository contents list (default: hidden).
.TP
.B\-\-short
only print file/directory names, nothing else
.TP
.BI\-\-format\ FORMAT
specify format for file or archive listing (default for files: "{mode} {user:6} {group:6} {size:8} {mtime} {path}{extra}{NL}"; for archives: "{archive:<36} {time} [{id}]{NL}")
.TP
.B\-\-json
Only valid for listing repository contents. Format output as JSON. The form of \fB\-\-format\fP is ignored, but keys used in it are added to the JSON output. Some keys are always present. Note: JSON can only represent text. A "barchive" key is therefore not available.
specify format for file listing (default: \(dq{mode} {user:6} {group:6} {size:8} {mtime} {path}{extra}{NL}\(dq)
.TP
.B\-\-json\-lines
Only valid for listing archive contents. Format output as JSON Lines. The form of \fB\-\-format\fP is ignored, but keys used in it are added to the JSON output. Some keys are always present. Note: JSON can only represent text. A "bpath" key is therefore not available.
Format output as JSON Lines. The form of \fB\-\-format\fP is ignored, but keys used in it are added to the JSON output. Some keys are always present. Note: JSON can only represent text.
.TP
.BI\-\-depth\ N
only list files up to the specified directory depth
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
.TP
.BI\-\-sort\-by\ KEYS
Comma\-separated list of sorting keys; valid keys are: timestamp, name, id; default is: timestamp
.TP
.BI\-\-first\ N
consider first N archives after other filters were applied
.TP
.BI\-\-last\ N
consider last N archives after other filters were applied
.UNINDENT
.SSExclusionoptions
.SSInclude/Excludeoptions
.INDENT0.0
.TP
.BI\-e\ PATTERN\fR,\fB\ \-\-exclude\ PATTERN
@ -105,16 +84,8 @@ read include/exclude patterns from PATTERNFILE, one per line
borg-mount \- Mount archive or an entire repository as a FUSE filesystem
borg-mount \- Mounts an archive or an entire repository as a FUSE filesystem.
.SHSYNOPSIS
.sp
borg [common options] mount [options] REPOSITORY_OR_ARCHIVE MOUNTPOINT [PATH...]
borg [common options] mount [options] MOUNTPOINT [PATH...]
.SHDESCRIPTION
.sp
This command mounts an archive as a FUSE filesystem. This can be useful for
browsing an archive or restoring individual files. Unless the \fB\-\-foreground\fP
option is given the command will run in the background until the filesystem
is \fBumounted\fP\&.
This command mounts a repository or an archive as a FUSE filesystem.
This can be useful for browsing or restoring individual files.
.sp
When restoring, take into account that the current FUSE implementation does
not support special fs flags and ACLs.
.sp
When mounting a repository, the top directories will be named like the
archives and the directory structure below these will be loaded on\-demand from
the repository when entering these directories, so expect some delay.
.sp
Unless the \fB\-\-foreground\fP option is given, the command will run in the
background until the filesystem is \fBunmounted\fP\&.
.sp
Performance tips:
.INDENT0.0
.IP\(bu2
When doing a \(dqwhole repository\(dq mount:
do not enter archive directories if not needed; this avoids on\-demand loading.
.IP\(bu2
Only mount a specific archive, not the whole repository.
.IP\(bu2
Only mount specific paths in a specific archive, not the complete archive.
.UNINDENT
.sp
The command \fBborgfs\fP provides a wrapper for \fBborg mount\fP\&. This can also be
used in fstab entries:
@ -50,100 +69,107 @@ To allow a regular user to use fstab entries, add the \fBuser\fP option:
For FUSE configuration and mount options, see the mount.fuse(8) manual page.
.sp
Borg\(aqs default behavior is to use the archived user and group names of each
file and map them to the system\(aqs respective user and group ids.
file and map them to the system\(aqs respective user and group IDs.
Alternatively, using \fBnumeric\-ids\fP will instead use the archived user and
group ids without any mapping.
group IDs without any mapping.
.sp
The \fBuid\fP and \fBgid\fP mount options (implemented by Borg) can be used to
override the user and group ids of all files (i.e., \fBborg mount \-o
override the user and group IDs of all files (i.e., \fBborg mount \-o
uid=1000,gid=1000\fP).
.sp
The man page references \fBuser_id\fP and \fBgroup_id\fP mount options
(implemented by fuse) which specify the user and group id of the mount owner
(aka, the user who does the mounting). It is set automatically by libfuse (or
(implemented by FUSE) which specify the user and group ID of the mount owner
(also known as the user who does the mounting). It is set automatically by libfuse (or
the filesystem if libfuse is not used). However, you should not specify these
manually. Unlike the \fBuid\fP and \fBgid\fP mount options which affect all files,
\fBuser_id\fP and \fBgroup_id\fP affect the user and group id of the mounted
manually. Unlike the \fBuid\fP and \fBgid\fP mount options, which affect all files,
\fBuser_id\fP and \fBgroup_id\fP affect the user and group ID of the mounted
(base) directory.
.sp
Additional mount options supported by borg:
Additional mount options supported by Borg:
.INDENT0.0
.IP\(bu2
versions: when used with a repository mount, this gives a merged, versioned
view of the files in the archives. EXPERIMENTAL, layout may change in future.
\fBversions\fP: when used with a repository mount, this gives a merged, versioned
view of the files in the archives. EXPERIMENTAL; layout may change in the future.
.IP\(bu2
allow_damaged_files: by default damaged files (where missing chunks were
replaced with runs of zeros by borg check \fB\-\-repair\fP) are not readable and
return EIO (I/O error). Set this option to read such files.
\fBallow_damaged_files\fP: by default, damaged files (where chunks are missing)
will return EIO (I/O error) when trying to read the related parts of the file.
Set this option to replace the missing parts with all\-zero bytes.
.IP\(bu2
ignore_permissions: for security reasons the "default_permissions" mount
option is internally enforced by borg. "ignore_permissions" can be given to
not enforce "default_permissions".
\fBignore_permissions\fP: for security reasons the \fBdefault_permissions\fP mount
option is internally enforced by Borg. \fBignore_permissions\fP can be given to
not enforce \fBdefault_permissions\fP\&.
.UNINDENT
.sp
The BORG_MOUNT_DATA_CACHE_ENTRIES environment variable is meant for advanced users
to tweak the performance. It sets the number of cached data chunks; additional
The BORG_MOUNT_DATA_CACHE_ENTRIES environment variable is intended for advanced users
to tweak performance. It sets the number of cached data chunks; additional
memory usage can be up to ~8 MiB times this number. The default is the number
of CPU cores.
.sp
When the daemonized process receives a signal or crashes, it does not unmount.
Unmounting in these cases could cause an active rsync or similar process
to unintentionally delete data.
to delete data unintentionally.
.sp
When running in the foreground ^C/SIGINT unmounts cleanly, but other
signals or crashes do not.
When running in the foreground, ^C/SIGINT cleanly unmounts the filesystem,
but other signals or crashes do not.
.sp
Debugging:
.sp
\fBborg mount\fP usually daemonizes and the daemon process sends stdout/stderr
to /dev/null. Thus, you need to either use \fB\-f / \-\-foreground\fP to make it stay
in the foreground and not daemonize, or use \fBBORG_LOGGING_CONF\fP to reconfigure
the logger to output to a file.
.SHOPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SSarguments
.INDENT0.0
.TP
.BREPOSITORY_OR_ARCHIVE
repository or archive to mount
.TP
.BMOUNTPOINT
where to mount filesystem
where to mount the filesystem
.TP
.BPATH
paths to extract; patterns are supported
.UNINDENT
.SSoptional arguments
.SSoptions
.INDENT0.0
.TP
.B\-\-consider\-checkpoints
Show checkpoint archives in the repository contents list (default: hidden).
.TP
.B\-f\fP,\fB\-\-foreground
stay in foreground, do not daemonize
.TP
.B\-o
Extra mount options
.TP
.B\-\-numeric\-owner
deprecated, use \fB\-\-numeric\-ids\fP instead
extra mount options
.TP
.B\-\-numeric\-ids
use numeric user and group identifiers from archive(s)
use numeric user and group identifiers from archives
.UNINDENT
.SSArchivefilters
.INDENT0.0
.TP
.BI\-P\ PREFIX\fR,\fB\ \-\-prefix\ PREFIX
only consider archive names starting with this prefix.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
only consider archives matching all patterns. See \(dqborg help match\-archives\(dq.
.TP
.BI\-a\ GLOB\fR,\fB\ \-\-glob\-archives\ GLOB
only consider archive names matching the glob. sh: rules apply, see "borg help patterns". \fB\-\-prefix\fP and \fB\-\-glob\-archives\fP are mutually exclusive.
.BI\-\-oldest\ TIMESPAN
consider archives between the oldest archive\(aqs timestamp and (oldest + TIMESPAN), e.g., 7d or 12m.
.TP
.BI\-\-newest\ TIMESPAN
consider archives between the newest archive\(aqs timestamp and (newest \- TIMESPAN), e.g., 7d or 12m.
.TP
.BI\-\-older\ TIMESPAN
consider archives older than (now \- TIMESPAN), e.g., 7d or 12m.
.TP
.BI\-\-newer\ TIMESPAN
consider archives newer than (now \- TIMESPAN), e.g., 7d or 12m.
.UNINDENT
.SHEXAMPLES
.sp
Be careful, prune is a potentially dangerous command, it will remove backup
Be careful: prune is a potentially dangerous command that removes backup
archives.
.sp
The default of prune is to apply to \fBall archives in the repository\fP unless
you restrict its operation to a subset of the archives using \fB\-\-prefix\fP\&.
When using \fB\-\-prefix\fP, be careful to choose a good prefix \- e.g. do not use a
prefix "foo" if you do not also want to match "foobar".
By default, prune applies to \fBall archives in the repository\fP unless you
restrict its operation to a subset of the archives.
.sp
The recommended way to name archives (with \fBborg create\fP) is to use the
identical archive name within a series of archives. Then you can simply give
that name to prune as well, so it operates only on that series of archives.
.sp
Alternatively, you can use \fB\-a\fP/\fB\-\-match\-archives\fP to match archive names
and select a subset of them.
When using \fB\-a\fP, be careful to choose a good pattern — for example, do not use a
prefix \(dqfoo\(dq if you do not also want to match \(dqfoobar\(dq.
.sp
It is strongly recommended to always run \fBprune \-v \-\-list \-\-dry\-run ...\fP
first so you will see what it would do without it actually doing anything.
first, so you will see what it would do without it actually doing anything.
.sp
Do not forget to run \fBborg compact \-v\fP after prune to actually free disk space.
.INDENT0.0
.INDENT3.5
.sp
.nf
.ftC
.EX
# Keep 7 end of day and 4 additional end of week archives.
# Do a dry\-run without actually deleting anything.
$ borg prune \-v \-\-list \-\-dry\-run \-\-keep\-daily=7 \-\-keep\-weekly=4 /path/to/repo
$ borg prune \-v \-\-list \-\-dry\-run \-\-keep\-daily=7 \-\-keep\-weekly=4
# Same as above but only apply to archive names starting with the hostname
# of the machine followed by a "\-" character:
$ borg prune \-v \-\-list \-\-keep\-daily=7 \-\-keep\-weekly=4 \-\-prefix=\(aq{hostname}\-\(aq /path/to/repo
# actually free disk space:
$ borg compact /path/to/repo
# Similar to the above, but only apply to the archive series named \(aq{hostname}\(aq:
$ borg prune \-v \-\-list \-\-keep\-daily=7 \-\-keep\-weekly=4 \(aq{hostname}\(aq
# Similar to the above, but apply to archive names starting with the hostname
# of the machine followed by a \(aq\-\(aq character:
$ borg prune \-v \-\-list \-\-keep\-daily=7 \-\-keep\-weekly=4 \-a \(aqsh:{hostname}\-*\(aq
# Keep 7 end of day, 4 additional end of week archives,
# and an end of month archive for every month:
$ borg prune \-v \-\-list \-\-keep\-daily=7 \-\-keep\-weekly=4 \-\-keep\-monthly=\-1 /path/to/repo
$ borg prune \-v \-\-list \-\-keep\-daily=7 \-\-keep\-weekly=4 \-\-keep\-monthly=\-1
# Keep all backups in the last 10 days, 4 additional end of week archives,
# and an end of month archive for every month:
$ borg prune \-v \-\-list \-\-keep\-within=10d \-\-keep\-weekly=4 \-\-keep\-monthly=\-1 /path/to/repo
.ftP
.fi
$ borg prune \-v \-\-list \-\-keep\-within=10d \-\-keep\-weekly=4 \-\-keep\-monthly=\-1
.EE
.UNINDENT
.UNINDENT
.sp
Some files were not shown because too many files have changed in this diff
Show more