Commit graph

187 commits

Author SHA1 Message Date
Thomas Waldmann
f1100f3c86
create: fix repo lock getting stale when processing lots of unchanged files, fixes #8442
as a side effect, maybe also better keeps the ssh / tcp connection alive,
if there is a bit of traffic every 60s.
2024-10-02 12:49:39 +02:00
Thomas Waldmann
f082df7f33
allow -a / --match-archives multiple times, ANDed
e.g.: borg delete -a home -a user:kenny -a host:kenny-pc
2024-09-27 00:19:15 +02:00
Thomas Waldmann
1436bbba1a
bugfix: remove superfluous repository.list() call
Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.

We can also detect that we are finished, if .list()
returns less than the limit we gave to it.

Also: reduce code duplication by using repo_lister func.
2024-09-24 23:43:08 +02:00
Thomas Waldmann
36e3d63474
chunks index caching, fixes #8397
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.

When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.

borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.

cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.

cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
2024-09-24 22:25:00 +02:00
Thomas Waldmann
e5e685fd1f
cache: fix crash in _build_files_cache 2024-09-22 00:36:30 +02:00
Thomas Waldmann
ec9d412756
fix race condition with data loss potential, fixes #3536
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.

so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
2024-09-21 11:34:34 +02:00
Thomas Waldmann
c100e7b1f5
files cache: update ctime, mtime of known and "unchanged" files, fixes #4915 2024-09-20 00:44:55 +02:00
Thomas Waldmann
a891559578
files cache improvements, fixes #8385, fixes #5658
- changes to locally stored files cache:

  - store as files.<H(archive_name)>
  - user can manually control suffix via env var
  - if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
  archive's metadata from the repo (better than starting with
  empty files cache and needing to read/chunk/hash all files).
  previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
  repo from same machine/user).

Cache entries now have ctime AND mtime.

Note: TTL and age still needed for discarding removed files.
      But due to the separate files caches per series, the TTL
      was lowered to 2 (from 20).
2024-09-20 00:40:49 +02:00
Thomas Waldmann
e2aa9d56d0
build_chunkindex_from_repo: reduce code duplication 2024-09-07 22:04:53 +02:00
Thomas Waldmann
ccc84c7a4e
cache: renamed .chunk_incref -> .reuse_chunk, boolean .seen_chunk
reuse_chunk is the complement of add_chunk for already existing chunks.

It doesn't do refcounting anymore.

.seen_chunk does not return the refcount anymore, but just whether the chunk exists.

If we add a new chunk, it immediately sets its refcount to MAX_VALUE, so
there is no difference anymore between previously existing chunks and new
chunks added. This makes the stats even more useless, but we have less complexity.
2024-09-07 22:04:47 +02:00
Thomas Waldmann
ef47666627
cache/hashindex: remove decref method, don't try to remove chunks on exceptions
When the AdhocCache(WithFiles) queries chunk IDs from the repo to build the chunks
index, it won't know their refcount and thus all chunks in the index have their
refcount at the MAX_VALUE (representing "infinite") and that would never decrease
nor could that ever reach zero and get the chunk deleted from the repo.

Only completely new chunks first written in the current borg run have a valid
refcount.

In some exception handlers, borg tried to clean up chunks that won't be used
by an item by decref'ing them. That is either:
- pointless due to refcount being at MAX_VALUE
- inefficient, because the user might retry the backup and would need to
  transmit these chunks to the repo again.

We'll just rely on borg compact ONLY to clean up any unused/orphan chunks.
2024-09-07 22:04:40 +02:00
Thomas Waldmann
d27b7a7981
cache: remove transactions, load files/chunks cache on demand 2024-09-07 22:04:38 +02:00
Thomas Waldmann
c67cf07522
Repository.list: return [(id, stored_size), ...]
Note: LegacyRepository still returns [id, ...] and so does RemoteRepository.list,
if the remote repo is a LegacyRepository.

also: use LIST_SCAN_LIMIT
2024-09-07 22:03:56 +02:00
Thomas Waldmann
05739aaa65
refactor: rename repository/locking classes/modules
Repository -> LegacyRepository
RemoteRepository -> LegacyRemoteRepository
borg.repository -> borg.legacyrepository
borg.remote -> borg.legacyremote

Repository3 -> Repository
RemoteRepository3 -> RemoteRepository
borg.repository3 -> borg.repository
borg.remote3 -> borg.remote

borg.locking -> borg.fslocking
borg.locking3 -> borg.storelocking
2024-09-07 22:01:11 +02:00
Thomas Waldmann
5e3f2c04d5
remove archive checkpointing
borg1 needed this due to its transactional / rollback behaviour:
if there was uncommitted stuff in the repo, next repo opening automatically
rolled back to last commit. thus we needed checkpoint archives to reference
chunks and commit the repo.

borg2 does not do that anymore, unused chunks are only removed when the
user invokes borg compact.

thus, if a borg create gets interrupted, the user can just run borg create
again and it will find some chunks are already in the repo, making progress
even if borg create gets frequently interrupted.
2024-09-07 22:00:54 +02:00
Thomas Waldmann
68e64adb9f
cache: add log msg to _load_chunks_from_repo
For big repos, this might take a while, so at least have messages on debug level.
2024-09-07 22:00:49 +02:00
Thomas Waldmann
1231c961fb
blacken the code 2024-09-07 22:00:39 +02:00
Thomas Waldmann
dcde48490e
remove CacheStatsMixin 2024-09-07 22:00:36 +02:00
Thomas Waldmann
fc6d459875
cache: replace .stats() by a dummy
Dummy returns all-zero stats from that call.

Problem was that these values can't be computed from the chunks cache
anymore. No correct refcounts, often no size information.

Also removed hashindex.ChunkIndex.summarize (previously used by the above mentioned
.stats() call) and .stats_against (unused) for same reason.
2024-09-07 22:00:35 +02:00
Thomas Waldmann
d6a70f48f2
remove LocalCache
Note: this is the default cache implementation in borg 1.x,
it worked well, but there were some issues:

- if the local chunks cache got out of sync with the repository,
  it needed an expensive rebuild from the infos in all archives.
- to optimize that, a local chunks.archive.d cache was used to
  speed that up, but at the price of quite significant space needs.

AdhocCacheWithFiles replaced this with a non-persistent chunks cache,
requesting all chunkids from the repository to initialize a simplified
non-persistent chunks index, that does not do real refcounting and also
initially does not have size information for pre-existing chunks.

We want to move away from precise refcounting, LocalCache needs to die.
2024-09-07 22:00:31 +02:00
Thomas Waldmann
8b9c052acc
manifest: store archives separately one-by-one into archives/*
repository:
- api/rpc support for get/put manifest
- api/rpc support to access the store
2024-09-07 22:00:21 +02:00
Thomas Waldmann
d30d5f4aec
Repository3 / RemoteRepository3: implement a borgstore based repository
Simplify the repository a lot:

No repository transactions, no log-like appending, no append-only, no segments,
just using a key/value store for the individual chunks.

No locking yet.

Also:

mypy: ignore missing import
there are no library stubs for borgstore yet, so mypy errors without that option.

pyproject.toml: install borgstore directly from github
There is no pypi release yet.

use pip install -e . rather than python setup.py develop
The latter is deprecated and had issues installing the "borgstore from github" dependency.
2024-08-23 23:55:09 +02:00
Thomas Waldmann
619a06a5ba
BORG_CACHE_IMPL defaults to "adhocwithfiles" now
Also: support a "cli" env var value, that does not determine
the implementation from the env var, but rather from cli options (similar to as it was before adding BORG_CACHE_IMPL).
2024-07-18 22:51:17 +02:00
Thomas Waldmann
5a500cddf8
rename NewCache -> AdHocWithFilesCache 2024-07-18 22:14:00 +02:00
Thomas Waldmann
616af8daa8
BORG_CACHE_IMPL environment variable added
BORG_CACHE_IMPL allows users to choose the client-side cache implementation from 'local', 'newcache' and 'adhoc'.
2024-07-15 12:45:16 +02:00
Thomas Waldmann
c7249583e7
fix test_cache_chunks
- skip test_cache_chunks if there is no persistent chunks cache file
- init self.chunks for AdHocCache
- remove warning output from AdHocCache.__init__, it gets mixed with JSON output and fails the JSON decoder.
2024-07-15 12:45:13 +02:00
Thomas Waldmann
561dcc8abf
Refactor cache sync options and introduce new cache preference
Add new borg create option '--prefer-adhoc-cache' to prefer the
AdHocCache over the NewCache implementation.

Adjust a test to match the previous default behaviour (== use the
AdHocCache) with --no-cache-sync.
2024-07-15 12:45:12 +02:00
Thomas Waldmann
85688e7543
keep timestamp only in security dir
removed some code borg had for backwards compatibility with
old borg versions (that had timestamp only in the cache).

now the manifest timestamp is only checked against the manifest-timestamp
file in the security dir, simplifying the code.
2024-07-15 12:45:09 +02:00
Thomas Waldmann
89d867ea30
keep key_type only in security dir
removed some code borg had for backwards compatibility with
old borg versions (that had key_type only in the cache).

now the repo key_type is only checked against the key-type
file in the security dir, simplifying the code.
2024-07-15 12:45:08 +02:00
Thomas Waldmann
cf8c3a3ae7
keep previous repo location only in security dir
removed some code borg had for backwards compatibility with
old borg versions (that had previous_location only in the
cache).

now the repo location is only checked against the location
file in the security dir, simplifying the code and also
fixing a related test failure with NewCache.

also improved test_repository_move to test for aborting in
case the repo location changed unexpectedly.
2024-07-15 12:45:06 +02:00
Thomas Waldmann
e2a1999c59
implement NewCache
Also:
- move common code to ChunksMixin
- always use ._txn_active (not .txn_active)

Some tests are still failing.
2024-07-15 12:44:52 +02:00
Thomas Waldmann
d466005682
refactor files cache code into FilesCacheMixin class 2024-07-15 12:44:47 +02:00
Thomas Waldmann
98162fbb42
create --no-cache-sync-forced option
when given, force using the AdHocCache.
2024-07-15 12:44:44 +02:00
Thomas Waldmann
de342581d6
fix AdHocCache.add_chunk signature (ctype, clevel kwargs) 2024-07-15 12:44:43 +02:00
Thomas Waldmann
17fce18b44
always give id and size to chunk_incref/chunk_decref
incref: returns (id, size), so it needs the size if it can't
get it from the chunks index. also needed for updating stats.

decref: caller does not always have the chunk size (e.g. for
metadata chunks),
as we consider 0 to be an invalid size, we call with size == 1
in that case. thus, stats might be slightly off.
2024-07-15 12:44:41 +02:00
Thomas Waldmann
4488c077a7
files cache: add chunk size information
the files cache used to have only the chunk ids,
so it had to rely on the chunks index having the
size information - which is problematic with e.g.
the AdhocCache (has size==0 for all not new chunks) and blocked using the files cache there.
2024-07-15 12:44:34 +02:00
William Bonnaventure
fb7a8f2d85 Add BORG_USE_CHUNKS_ARCHIVE option 2024-07-13 21:26:13 +02:00
William Bonnaventure
c3fb27f463
Automatic rebuild cache on exception, fixes #5213 (#8257)
Try to rebuild cache if an exception is raised, fixes #5213

For now, we catch FileNotFoundError and FileIntegrityError.

Write cache config without manifest to prevent override of manifest_id.
This is needed in order to have an empty manifest_id.
This empty id triggers the re-syncing of the chunks cache by calling sync() inside LocalCache.__init__()

Adapt and extend test_cache_chunks to new behaviour:

- a cache wipe is expected now.
- borg detects the corrupt cache and wipes/rebuilds the cache.
- check if the in-memory and on-disk cache is as expected (a rebuilt chunks cache).
2024-07-06 18:05:01 +02:00
Thomas Waldmann
334fbab897
refactor: use less binascii
our own hex_to_bin / bin_to_hex is more comfortable to use.

also: optimize remaining binascii usage / imports.
2024-02-19 02:16:19 +01:00
Thomas Waldmann
9de07ebd46
update "modern" error RCs (docs and code) 2024-02-13 22:58:02 +01:00
Thomas Waldmann
6a68ad5cd6
remove archive TAMs 2023-09-24 20:10:51 +02:00
Thomas Waldmann
1b6f928917
ro_type: typed repo objects, see #7670
writing: put type into repoobj metadata
reading: check wanted type against type we got

repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.

a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).

also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
  authentication/decryption would fail on mismatch.
- the type check would fail.

thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
2023-09-24 20:10:50 +02:00
Thomas Waldmann
0fcd3e9479
add_chunk: remove overwrite parameter 2023-09-23 00:10:35 +02:00
Thomas Waldmann
2d78fa89a5
always implicitly require archive TAMs
they must be there since the upgrade to borg 1.2.6 (or other
borg versions that also have a fix for CVE-2023-36811).
2023-09-03 22:02:35 +02:00
Thomas Waldmann
5cd2060345
rebuild_refcounts: keep archive ID, if possible
rebuild_refcounts verifies and recreates the TAM.
Now it re-uses the salt, so that the archive ID does not change
just because of a new salt if the archive has still the same data.
2023-08-30 01:13:52 +02:00
Thomas Waldmann
277b0b81a8
cache sync: check archive TAM 2023-08-30 00:58:00 +02:00
Thomas Waldmann
5013121bd8
fix E501 2023-07-26 01:24:20 +02:00
Thomas Waldmann
3017701958
simplify flake8 configuration
we use black since a while, so some stuff does not need to be ignored any more.
2023-07-25 23:56:31 +02:00
Thomas Waldmann
ec1f2dfbf1
--files-cache=size: fix crash, fixes #7658 2023-06-29 23:09:24 +02:00
Thomas Waldmann
989b0a2847
use correct path for security dir when accessing legacy repos (v1)
while on macOS the new and old security dir location is the same path,
this is not the case on e.g. Linux, it could move from .config/borg/security to
.local/share/borg/security .

See #5760.
2023-05-19 21:12:59 +02:00