buzhash64: docs

This commit is contained in:
Thomas Waldmann 2025-06-06 11:56:49 +02:00
parent b9646f236e
commit d23704e112
No known key found for this signature in database
GPG key ID: 243ACFA951F78E01
3 changed files with 23 additions and 6 deletions

View file

@ -19,8 +19,8 @@ specified when the backup was performed.
Deduplication is performed globally across all data in the repository
(multiple backups and even multiple hosts), both on data and file
metadata, using :ref:`chunks` created by the chunker using the
Buzhash_ algorithm ("buzhash" chunker) or a simpler fixed blocksize
algorithm ("fixed" chunker).
Buzhash_ algorithm ("buzhash" and "buzhash64" chunker) or a simpler
fixed blocksize algorithm ("fixed" chunker).
To perform the repository-wide deduplication, a hash of each
chunk is checked against the :ref:`chunks cache <cache>`, which is a

View file

@ -399,6 +399,7 @@ Borg has these chunkers:
supporting a header block of different size.
- "buzhash": variable, content-defined blocksize, uses a rolling hash
computed by the Buzhash_ algorithm.
- "buzhash64": similar to "buzhash", but improved 64bit implementation
For some more general usage hints see also ``--chunker-params``.
@ -469,6 +470,16 @@ for the repository, and stored encrypted in the keyfile. This is to prevent
chunk size based fingerprinting attacks on your encrypted repo contents (to
guess what files you have based on a specific set of chunk sizes).
"buzhash64" chunker
+++++++++++++++++++
Similar to "buzhash", but using 64bit wide hash values.
The buzhash table is cryptographically derived from secret key material.
These changes should improve resistance against attacks and also solve
some of the issues of the original (32bit / XORed table) implementation.
.. _cache:
The cache

View file

@ -361,13 +361,19 @@ The chunks stored in the repo are the (compressed, encrypted and authenticated)
output of the chunker. The sizes of these stored chunks are influenced by the
compression, encryption and authentication.
buzhash chunker
~~~~~~~~~~~~~~~
buzhash and buzhash64 chunker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The buzhash chunker chunks according to the input data, the chunker's
parameters and the secret chunker seed (which all influence the chunk boundary
The buzhash chunkers chunk according to the input data, the chunker's
parameters and secret key material (which all influence the chunk boundary
positions).
Secret key material:
- "buzhash": chunker seed (32bits), used for XORing the hardcoded buzhash table
- "buzhash64": bh64_key (256bits) is derived from ID key, used to cryptographically
generate the table.
Small files below some specific threshold (default: 512 KiB) result in only one
chunk (identical content / size as the original file), bigger files result in
multiple chunks.