diff --git a/docs/faq.rst b/docs/faq.rst index dc4aface8..9df5d73e6 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -846,8 +846,7 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat - Transmits to repo. If the repo is remote, this usually involves an SSH connection (does its own encryption / authentication). - Stores the chunk into a key/value store (the key is the chunk id, the value - is the data). While doing that, it computes XXH64 of the data (repo low-level - checksum, used by borg check --repository). + is the data). Subsequent backups are usually very fast if most files are unchanged and only a few are new or modified. The high performance on unchanged files primarily depends diff --git a/docs/global.rst.inc b/docs/global.rst.inc index 0a1fe9f5c..a3c8df1cc 100644 --- a/docs/global.rst.inc +++ b/docs/global.rst.inc @@ -14,7 +14,6 @@ .. _ACL: https://en.wikipedia.org/wiki/Access_control_list .. _libacl: https://savannah.nongnu.org/projects/acl/ .. _libattr: https://savannah.nongnu.org/projects/attr/ -.. _libxxhash: https://github.com/Cyan4973/xxHash .. _liblz4: https://github.com/Cyan4973/lz4 .. _libzstd: https://github.com/facebook/zstd .. _OpenSSL: https://www.openssl.org/ @@ -28,4 +27,3 @@ .. _userspace filesystems: https://en.wikipedia.org/wiki/Filesystem_in_Userspace .. _Cython: https://cython.org/ .. _virtualenv: https://pypi.org/project/virtualenv/ -.. _python-xxhash: https://github.com/ifduyue/python-xxhash/ diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index 2d7319a08..f801a433d 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -81,14 +81,9 @@ A repo object has a structure like this: * 32-bit meta size * 32-bit data size -* 64-bit xxh64(meta) -* 64-bit xxh64(data) * meta * data -The size and xxh64 hashes can be used for server-side corruption checks without -needing to decrypt anything (which would require the borg key). - The overall size of repository objects varies from very small (a small source file will be stored as a single repository object) to medium (big source files will be cut into medium-sized chunks of some MB). @@ -897,8 +892,7 @@ Data corruption in the files cache could create incorrect archives, e.g. due to wrong object IDs or sizes in the files cache. Therefore, Borg calculates checksums when writing these files and tests checksums -when reading them. Checksums are generally 64-bit XXH64 hashes. -The canonical xxHash representation is used, i.e. big-endian. +when reading them. Checksums are generally 256-bit sha256 hashes. Checksums are stored as hexadecimal ASCII strings. For compatibility, checksums are not required and absent checksums do not trigger errors. @@ -909,19 +903,7 @@ Checksums are a data safety mechanism. They are not a security mechanism. .. rubric:: Choice of algorithm -XXH64 has been chosen for its high speed on all platforms, which avoids performance -degradation in CPU-limited parts (e.g. cache synchronization). -Unlike CRC32, it neither requires hardware support (crc32c or CLMUL) -nor vectorized code nor large, cache-unfriendly lookup tables to achieve good performance. -This simplifies deployment of it considerably (cf. src/borg/algorithms/crc32...). - -Further, XXH64 is a non-linear hash function and thus has a "more or less" good -chance to detect larger burst errors, unlike linear CRCs where the probability -of detection decreases with error size. - -The 64-bit checksum length is considered sufficient for the file sizes typically -checksummed (individual files up to a few GB, usually less). -xxHash was expressly designed for data blocks of these sizes. +sha256 has been chosen for its wide availability on all platforms and hw acceleration on some. Lower layer — file_integrity ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -959,10 +941,10 @@ All checksums are compiled into a simple JSON structure called *integrity data*: .. code-block:: json { - "algorithm": "XXH64", + "algorithm": "SHA256", "digests": { - "HashHeader": "eab6802590ba39e3", - "final": "e2a7f132fc2e8b24" + "HashHeader": "eab6802590ba39e3...", + "final": "e2a7f132fc2e8b24..." } } @@ -996,7 +978,7 @@ The ``[integrity]`` section is used: [integrity] manifest = 10e...21c - files = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}} + files = {"algorithm": "SHA256", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}} The manifest ID is duplicated in the integrity section due to the way all Borg versions handle the config file. Instead of creating a "new" config file from diff --git a/pyproject.toml b/pyproject.toml index 55fb54cab..683a46372 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -39,7 +39,6 @@ dependencies = [ "argon2-cffi", "shtab>=1.8.0", "backports-zstd; python_version < '3.14'", # for python < 3.14. - "xxhash>=2.0.0", "jsonargparse>=4.47.0", "PyYAML>=6.0.2", # we need to register our types with yaml, jsonargparse uses yaml for config files "blake3>=1.0.0", diff --git a/scripts/msys2-install-deps b/scripts/msys2-install-deps index e831dc679..ae4dc4bd8 100644 --- a/scripts/msys2-install-deps +++ b/scripts/msys2-install-deps @@ -1,6 +1,6 @@ #!/bin/bash -pacman -S --needed --noconfirm git mingw-w64-ucrt-x86_64-{toolchain,pkgconf,lz4,xxhash,openssl,rclone,python-msgpack,python-argon2_cffi,python-platformdirs,python,cython,python-setuptools,python-wheel,python-build,python-pkgconfig,python-packaging,python-pip,python-paramiko,rust,python-maturin} +pacman -S --needed --noconfirm git mingw-w64-ucrt-x86_64-{toolchain,pkgconf,lz4,openssl,rclone,python-msgpack,python-argon2_cffi,python-platformdirs,python,cython,python-setuptools,python-wheel,python-build,python-pkgconfig,python-packaging,python-pip,python-paramiko,rust,python-maturin} if [ "$1" = "development" ]; then pacman -S --needed --noconfirm mingw-w64-ucrt-x86_64-python-{pytest,pytest-benchmark,pytest-cov,pytest-xdist}