From d23704e112a2e47621a9c648b73c8d15977790dd Mon Sep 17 00:00:00 2001 From: Thomas Waldmann Date: Fri, 6 Jun 2025 11:56:49 +0200 Subject: [PATCH] buzhash64: docs --- docs/internals.rst | 4 ++-- docs/internals/data-structures.rst | 11 +++++++++++ docs/internals/security.rst | 14 ++++++++++---- 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/docs/internals.rst b/docs/internals.rst index e587803cb..3c6645c19 100644 --- a/docs/internals.rst +++ b/docs/internals.rst @@ -19,8 +19,8 @@ specified when the backup was performed. Deduplication is performed globally across all data in the repository (multiple backups and even multiple hosts), both on data and file metadata, using :ref:`chunks` created by the chunker using the -Buzhash_ algorithm ("buzhash" chunker) or a simpler fixed blocksize -algorithm ("fixed" chunker). +Buzhash_ algorithm ("buzhash" and "buzhash64" chunker) or a simpler +fixed blocksize algorithm ("fixed" chunker). To perform the repository-wide deduplication, a hash of each chunk is checked against the :ref:`chunks cache `, which is a diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index ff1136a60..b7ffccc36 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -399,6 +399,7 @@ Borg has these chunkers: supporting a header block of different size. - "buzhash": variable, content-defined blocksize, uses a rolling hash computed by the Buzhash_ algorithm. +- "buzhash64": similar to "buzhash", but improved 64bit implementation For some more general usage hints see also ``--chunker-params``. @@ -469,6 +470,16 @@ for the repository, and stored encrypted in the keyfile. This is to prevent chunk size based fingerprinting attacks on your encrypted repo contents (to guess what files you have based on a specific set of chunk sizes). +"buzhash64" chunker ++++++++++++++++++++ + +Similar to "buzhash", but using 64bit wide hash values. + +The buzhash table is cryptographically derived from secret key material. + +These changes should improve resistance against attacks and also solve +some of the issues of the original (32bit / XORed table) implementation. + .. _cache: The cache diff --git a/docs/internals/security.rst b/docs/internals/security.rst index 40b27d797..bcddbb2e8 100644 --- a/docs/internals/security.rst +++ b/docs/internals/security.rst @@ -361,13 +361,19 @@ The chunks stored in the repo are the (compressed, encrypted and authenticated) output of the chunker. The sizes of these stored chunks are influenced by the compression, encryption and authentication. -buzhash chunker -~~~~~~~~~~~~~~~ +buzhash and buzhash64 chunker +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The buzhash chunker chunks according to the input data, the chunker's -parameters and the secret chunker seed (which all influence the chunk boundary +The buzhash chunkers chunk according to the input data, the chunker's +parameters and secret key material (which all influence the chunk boundary positions). +Secret key material: + +- "buzhash": chunker seed (32bits), used for XORing the hardcoded buzhash table +- "buzhash64": bh64_key (256bits) is derived from ID key, used to cryptographically + generate the table. + Small files below some specific threshold (default: 512 KiB) result in only one chunk (identical content / size as the original file), bigger files result in multiple chunks.