docs: add FAQ entry for bad backups and deduplication, fixes #4744

This commit is contained in:
TW 2026-05-13 09:58:58 +02:00 committed by GitHub
parent a5bfda12e6
commit 3f745f38d4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -302,6 +302,34 @@ Yes, if you want to detect accidental data damage (like bit rot), use the
If you want to be able to detect malicious tampering also, use an encrypted
repo. It will then be able to check using CRCs and HMACs.
Can a previous bad backup spoil future backups?
-----------------------------------------------
In general, no. If a backup was interrupted or failed for some reason, Borg's
transactional nature and journaling system ensure that the repository remains
consistent. Data that was successfully stored in a partial backup
(checkpoints) will even be reused to speed up the next attempt.
However, there is one specific case where a past "bad" backup can affect
future ones due to how deduplication works:
E.g. one could imagine that after computing the MAC (chunk id) of the correct
chunk content data that data gets corrupted (e.g. due to a RAM issue). This
issue will go unnoticed until the MAC is compared to the data again (e.g. when
reading the data from the repo or doing a repo check with ``--verify-data``).
As the MAC is correct, the deduplication "thinks" it already has the correct
data while in fact it only has the corrupted version of that data. In that
case the past bad backup affects the current or future backup due to
deduplication.
This is not a Borg-specific issue, but a general property of deduplicating
storage systems. To avoid or detect such issues, you should:
- Use reliable hardware (ECC RAM is recommended).
- Periodically run ``borg check --verify-data REPO`` to verify that the
stored data still matches its MAC (chunk id). Note that this cannot detect
if the data was already "garbage" when it was first processed and stored.
Can I use Borg on SMR hard drives?
----------------------------------