Merge pull request #9750 from ThomasWaldmann/remove-xxh64

remove xxhash / xxh64 requirement, mentions
This commit is contained in:
TW 2026-06-10 14:00:23 +02:00 committed by GitHub
commit d2cdeaff57
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 8 additions and 30 deletions

View file

@ -846,8 +846,7 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat
- Transmits to repo. If the repo is remote, this usually involves an SSH connection
(does its own encryption / authentication).
- Stores the chunk into a key/value store (the key is the chunk id, the value
is the data). While doing that, it computes XXH64 of the data (repo low-level
checksum, used by borg check --repository).
is the data).
Subsequent backups are usually very fast if most files are unchanged and only
a few are new or modified. The high performance on unchanged files primarily depends

View file

@ -14,7 +14,6 @@
.. _ACL: https://en.wikipedia.org/wiki/Access_control_list
.. _libacl: https://savannah.nongnu.org/projects/acl/
.. _libattr: https://savannah.nongnu.org/projects/attr/
.. _libxxhash: https://github.com/Cyan4973/xxHash
.. _liblz4: https://github.com/Cyan4973/lz4
.. _libzstd: https://github.com/facebook/zstd
.. _OpenSSL: https://www.openssl.org/
@ -28,4 +27,3 @@
.. _userspace filesystems: https://en.wikipedia.org/wiki/Filesystem_in_Userspace
.. _Cython: https://cython.org/
.. _virtualenv: https://pypi.org/project/virtualenv/
.. _python-xxhash: https://github.com/ifduyue/python-xxhash/

View file

@ -81,14 +81,9 @@ A repo object has a structure like this:
* 32-bit meta size
* 32-bit data size
* 64-bit xxh64(meta)
* 64-bit xxh64(data)
* meta
* data
The size and xxh64 hashes can be used for server-side corruption checks without
needing to decrypt anything (which would require the borg key).
The overall size of repository objects varies from very small (a small source
file will be stored as a single repository object) to medium (big source files will
be cut into medium-sized chunks of some MB).
@ -897,8 +892,7 @@ Data corruption in the files cache could create incorrect archives, e.g. due
to wrong object IDs or sizes in the files cache.
Therefore, Borg calculates checksums when writing these files and tests checksums
when reading them. Checksums are generally 64-bit XXH64 hashes.
The canonical xxHash representation is used, i.e. big-endian.
when reading them. Checksums are generally 256-bit sha256 hashes.
Checksums are stored as hexadecimal ASCII strings.
For compatibility, checksums are not required and absent checksums do not trigger errors.
@ -909,19 +903,7 @@ Checksums are a data safety mechanism. They are not a security mechanism.
.. rubric:: Choice of algorithm
XXH64 has been chosen for its high speed on all platforms, which avoids performance
degradation in CPU-limited parts (e.g. cache synchronization).
Unlike CRC32, it neither requires hardware support (crc32c or CLMUL)
nor vectorized code nor large, cache-unfriendly lookup tables to achieve good performance.
This simplifies deployment of it considerably (cf. src/borg/algorithms/crc32...).
Further, XXH64 is a non-linear hash function and thus has a "more or less" good
chance to detect larger burst errors, unlike linear CRCs where the probability
of detection decreases with error size.
The 64-bit checksum length is considered sufficient for the file sizes typically
checksummed (individual files up to a few GB, usually less).
xxHash was expressly designed for data blocks of these sizes.
sha256 has been chosen for its wide availability on all platforms and hw acceleration on some.
Lower layer — file_integrity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -959,10 +941,10 @@ All checksums are compiled into a simple JSON structure called *integrity data*:
.. code-block:: json
{
"algorithm": "XXH64",
"algorithm": "SHA256",
"digests": {
"HashHeader": "eab6802590ba39e3",
"final": "e2a7f132fc2e8b24"
"HashHeader": "eab6802590ba39e3...",
"final": "e2a7f132fc2e8b24..."
}
}
@ -996,7 +978,7 @@ The ``[integrity]`` section is used:
[integrity]
manifest = 10e...21c
files = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
files = {"algorithm": "SHA256", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
The manifest ID is duplicated in the integrity section due to the way all Borg
versions handle the config file. Instead of creating a "new" config file from

View file

@ -39,7 +39,6 @@ dependencies = [
"argon2-cffi",
"shtab>=1.8.0",
"backports-zstd; python_version < '3.14'", # for python < 3.14.
"xxhash>=2.0.0",
"jsonargparse>=4.47.0",
"PyYAML>=6.0.2", # we need to register our types with yaml, jsonargparse uses yaml for config files
"blake3>=1.0.0",

View file

@ -1,6 +1,6 @@
#!/bin/bash
pacman -S --needed --noconfirm git mingw-w64-ucrt-x86_64-{toolchain,pkgconf,lz4,xxhash,openssl,rclone,python-msgpack,python-argon2_cffi,python-platformdirs,python,cython,python-setuptools,python-wheel,python-build,python-pkgconfig,python-packaging,python-pip,python-paramiko,rust,python-maturin}
pacman -S --needed --noconfirm git mingw-w64-ucrt-x86_64-{toolchain,pkgconf,lz4,openssl,rclone,python-msgpack,python-argon2_cffi,python-platformdirs,python,cython,python-setuptools,python-wheel,python-build,python-pkgconfig,python-packaging,python-pip,python-paramiko,rust,python-maturin}
if [ "$1" = "development" ]; then
pacman -S --needed --noconfirm mingw-w64-ucrt-x86_64-python-{pytest,pytest-benchmark,pytest-cov,pytest-xdist}