docs: revise pack file internals based on PR #9669 review, refs #8572

remove forward-looking N>1 references, hardcoded offsets, and stale "currently";
use borgstore vocabulary, medium-sized index files, and simplified recovery prose
This commit is contained in:
Mrityunjay Raj 2026-05-29 00:45:12 +05:30
parent 5d6cafe4b8
commit 6bba477ae2
5 changed files with 87 additions and 201 deletions

View file

@ -178,11 +178,6 @@ cite {
#common-options .option {
white-space: nowrap;
}
/* Extra vertical breathing room for figures that need it. */
.figure-padded {
margin-top: 1.5em;
margin-bottom: 1.5em;
}
/* Remove the right-column max-width cap so content fills the full available width. */
#right-column {

Binary file not shown.

Before

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 120 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 97 KiB

View file

@ -6,15 +6,15 @@
Pack files
==========
Borg currently stores each repository object (chunk) as a separate object in the
borgstore. For large repositories this means millions of individual objects, each
requiring its own I/O round trip to read or write. On high-latency backends (SFTP,
cloud object storage) this overhead dominates backup and restore times.
Without pack files, each repository chunk is stored as a separate borgstore object.
For large repositories this means millions of individual objects, each requiring its
own I/O round trip to read or write. On high-latency backends (SFTP, cloud object
storage) this overhead dominates backup and restore times.
Pack files fix this by grouping multiple repo objects into a single store
object. A reader that needs one chunk does a partial read (range request) at
a known offset instead of fetching a separate file. Store object count drops
from one-per-chunk to one-per-pack.
Pack files address this by grouping multiple chunks into a single store object. A
reader that needs one chunk does a partial read (range request) at a known offset
instead of fetching a separate file. Store object count drops from one-per-chunk to
one-per-pack.
.. _pack-format:
@ -22,189 +22,110 @@ from one-per-chunk to one-per-pack.
Pack File Format
----------------
Every pack file begins with a fixed 9-byte file header, followed by one or more
length-prefixed repo object blobs. The format is designed for forward scanning:
given only the file bytes and no external index, a reader can locate every blob
boundary by reading the 4-byte length prefix before each blob.
File header
~~~~~~~~~~~
::
Offset Size Type Field
------ ---- ------ -----
0 8 bytes Magic: ASCII b"BORGPACK"
8 1 uint8 Format version: 0x01
Any reader can check the magic bytes to confirm a file in the ``packs/``
namespace is a valid Borg pack file and reject misplaced or truncated files
before parsing further. The version byte lets future incompatible layout changes
be detected without bumping the repository version.
There is no separate file header. Each blob starts with an 8-byte ``BORGPACK``
magic, so a forward scanner can locate blob boundaries and identify each chunk
using only the pack file bytes with no external index.
Per-blob layout
~~~~~~~~~~~~~~~
Each blob is stored as a 4-byte length prefix followed by the full repo object::
Each blob is a self-contained unit::
Offset (relative to blob start) Size Type Field
-------------------------------- ---------- ------- -----
0 4 uint32le blob_len
4 24 bytes ObjHeader
4 + 24 meta_size bytes encrypted_meta
4 + 24 + meta_size data_size bytes encrypted_data
0 8 bytes Magic: ASCII b"BORGPACK"
8 1 uint8 Format version: 0x01
9 32 bytes chunk_id
41 4 uint32le meta_size
45 4 uint32le data_size
49 meta_size bytes encrypted_meta
49 + meta_size data_size bytes encrypted_data
``blob_len`` is the total byte count of the repo object that follows: ObjHeader
(always 24 bytes) plus ``encrypted_meta`` plus ``encrypted_data``. It satisfies::
``chunk_id`` is the keyed MAC of the plaintext data (``id_hash(plaintext_data)``).
Storing it in the unencrypted header lets a scanner rebuild the
``chunk_id → location`` index without decrypting any blob.
blob_len == 24 + meta_size + data_size
A reader locates the next blob by advancing::
The ObjHeader is the existing 24-byte structure shared by all Borg repo objects::
next_blob_offset = current_blob_offset + 49 + meta_size + data_size
Offset Size Type Field
------ ---- ------- -----
0 4 uint32le meta_size
4 4 uint32le data_size
8 8 bytes xxh64(encrypted_meta)
16 8 bytes xxh64(encrypted_data)
The per-blob magic limits corruption blast radius: a damaged blob causes the
scanner to lose at most one chunk. Once it finds the next ``BORGPACK`` sequence
it resumes.
The ObjHeader is stored unmodified inside the pack blob; the pack layer treats blobs
as opaque bytes and does not rewrite the header. The SHA256 content-addressed
``pack_id`` handles pack-level integrity; the xxh64 fields come from the existing
RepoObj wire format and are left as-is. They remain useful for the keyless recovery
scan (see :ref:`pack-recovery`) where AEAD decryption is not available.
Blobs follow one another contiguously with no padding::
.. figure:: pack-objheader.png
:figwidth: 100%
:width: 100%
:figclass: figure-padded
ObjHeader structure: 24 bytes encoding the sizes and xxh64 integrity hashes for
the encrypted meta and data sections of each blob.
Blobs follow one another contiguously with no padding between them::
[file header: 9 B]
[blob_len_0: 4 B][ObjHeader_0: 24 B][encrypted_meta_0][encrypted_data_0]
[blob_len_1: 4 B][ObjHeader_1: 24 B][encrypted_meta_1][encrypted_data_1]
[BORGPACK_0: 8B][0x01: 1B][chunk_id_0: 32B][meta_size_0: 4B][data_size_0: 4B][encrypted_meta_0][encrypted_data_0]
[BORGPACK_1: 8B][0x01: 1B][chunk_id_1: 32B][meta_size_1: 4B][data_size_1: 4B][encrypted_meta_1][encrypted_data_1]
...
[blob_len_N-1: 4 B][ObjHeader_N-1: 24 B][encrypted_meta_N-1][encrypted_data_N-1]
.. figure:: pack-layout.png
:figwidth: 100%
:width: 100%
:figclass: figure-padded
Pack file binary layout: 9-byte file header followed by contiguous
length-prefixed blobs, each containing an ObjHeader and the encrypted payload.
There is no trailing table of contents. The ``index/`` namespace (see
:ref:`pack-index-namespace`) is the sole authoritative source of chunk-to-location
mappings for normal operation.
Pack ID
~~~~~~~
For packs containing more than one blob, the pack ID is the SHA-256 digest of the
entire pack file content (file header plus all blobs)::
The pack ID equals the ``chunk_id`` of the blob it contains::
pack_id = SHA256(pack_file_bytes)
pack_id = chunk_id
This makes pack files content-addressed: the stored filename is a commitment to the
content. ``borg check`` can detect silent corruption by recomputing the digest and
comparing it to the filename without decrypting any blob.
Since ``chunk_id`` is a keyed MAC of the plaintext, the filename commits to the
content. ``borg check`` can detect silent corruption without decrypting any blob.
Namespace
~~~~~~~~~
Pack files are stored under the ``packs/`` namespace in borgstore, using a two-level
directory nesting on the first two bytes of the pack ID (hex-encoded)::
Pack files are stored under the ``packs/`` namespace in borgstore, using a
single directory level keyed on the first byte of the pack ID (hex-encoded)::
packs/
00/ .. ff/
00/ .. ff/
<pack_id_hex>
<pack_id_hex>
The nesting depth is controlled by the ``packs/`` entry in the repository's
``levels_config``, the same mechanism used by the ``data/`` namespace.
.. _pack-phase1:
.. _pack-index-entry:
Phase 1 Implementation (N=1)
-----------------------------
Pack Index Entry
----------------
The initial implementation puts one blob per pack file. Assembly is simpler: no
multi-chunk buffering, and the PackIndex lookup follows directly from the chunk ID.
Under this phase the pack ID is set equal to the chunk ID of the single blob it
contains::
pack_id = chunk_id # Phase 1 only
Computing SHA-256 over the pack content is therefore unnecessary. The pack for a
given chunk is always at::
Each pack contains one blob. The pack for a given chunk is always at::
packs/<hex(chunk_id)>
where the chunk ID is the same keyed MAC (``id_hash(plaintext_data)``) used today
for objects in ``data/``.
A PackIndex entry records the pack and byte range needed to read a blob::
The ``BORGPACK`` header and the 4-byte ``blob_len`` prefix are written regardless.
Phase 1 packs are structurally identical to multi-blob packs; readers require no
special case for N=1.
A PackIndex entry for Phase 1 packs is::
chunk_id → (pack_id = chunk_id,
offset = 13, # 9-byte file header + 4-byte blob_len
length = blob_len) # value read from bytes 9-12 of the pack file
.. note::
The N value is configurable at ``borg repo-create`` time. Expanding to N>1 in
a follow-on change requires no modification to the pack file format: the file
header and per-blob layout are identical. Only the pack assembly, PackIndex
update, and pack ID computation logic changes.
chunk_id → (pack_id = chunk_id, offset, length)
``offset`` is the position of the blob's size fields within the pack file.
``length`` covers those size fields plus the encrypted meta and data payloads.
.. _pack-write-order:
Write Order and Crash Safety
-----------------------------
Pack data must reach stable storage before any index or manifest entry references
it. The required write order is:
Pack data must be stored before any archive pointer references it.
The required write order is:
1. Write the pack file to ``packs/<pack_id>``.
2. ``fsync`` the pack file and its containing directory.
3. Write the index piece file to ``index/<index_id>`` (see :ref:`pack-index-namespace`).
4. ``fsync`` the index piece file.
5. Update and write the manifest. This is the sole commit point.
1. Store the pack file to ``packs/<pack_id>`` via borgstore.
2. Store the partial index file to ``index/<index_id>`` (see :ref:`pack-index-namespace`).
3. Write the archive and archive pointer. This is the sole commit point.
.. figure:: pack-write-order.png
:figwidth: 40%
:width: 100%
:align: center
Write order and crash safety: the manifest write at step 5 is the sole commit
point; partial failures at earlier steps leave only harmless orphan data.
A crash between steps 1 and 3 leaves orphan pack files in ``packs/``. No archive
A crash between steps 1 and 2 leaves orphan pack files in ``packs/``. No archive
references these chunks; ``borg compact`` removes them on the next run.
A crash between steps 3 and 5 leaves an index piece covering packs not yet committed
to any archive. The extra index entries point to valid, fully-written pack data; they
are harmless and will be cleaned up by the next ``borg compact``.
A crash between steps 2 and 3 leaves a partial index file covering packs not yet
committed to any archive. The extra index entries point to valid, fully-written pack
data; they are harmless and will be cleaned up by the next ``borg compact``.
A crash after step 5 cannot leave the repository in an inconsistent state. The
manifest is the commit point: data that the manifest does not reference is unreachable
and treated as garbage by ``borg compact``.
A crash after step 3 cannot leave the repository in an inconsistent state. The
archive pointer write is the commit point: data not referenced by any archive pointer
is unreachable and treated as garbage by ``borg compact``.
Deletion is soft: ``repository.delete()`` does not remove the pack file. The pack
stays on disk until ``borg compact`` confirms via mark-and-sweep that none of its
blobs appear in any archive, then removes the whole file. The same approach carries to N>1: there is no way to remove one blob from a pack
without rewriting the whole file, so soft-delete is the only option.
Only ``borg compact`` and ``borg check --repair`` delete pack files. When compact
determines via mark-and-sweep that none of a pack's blobs are referenced by any
archive, it removes the whole file. Individual blobs cannot be removed without
rewriting the entire pack, so deletion always operates at pack granularity.
.. _pack-index-namespace:
@ -212,28 +133,31 @@ without rewriting the whole file, so soft-delete is the only option.
Index Namespace
---------------
Borg does not embed a table of contents inside each pack file. Chunk-to-location
mappings are stored as a separate set of encrypted piece files under the ``index/``
namespace.
Chunk-to-location mappings are stored as a separate set of encrypted partial index
files under the ``index/`` namespace.
Each piece file covers the packs written in one backup session. Its name is the
SHA-256 digest of its own content::
Each partial index file covers the packs written in one backup session. Its name is
the SHA-256 digest of its own content::
index/
<sha256_of_content_hex>
Content-addressed naming makes each piece file self-verifying and idempotent: writing
the same index data twice produces the same filename, so a repeated write is a no-op.
Content-addressed naming makes each partial index file self-verifying and idempotent:
writing the same index data twice produces the same filename, so a repeated write is
a no-op.
Piece files are write-once. A session appends new piece files; existing files are
never modified. On repository open, the client downloads all files under ``index/``,
decrypts them, and merges the results into the in-memory PackIndex (a ``borghash``
``HashTableNT`` keyed on ``chunk_id``). The merge is commutative and idempotent;
piece file order does not matter.
Partial index files are write-once. A session stores new partial index files via
borgstore; existing files are never modified. On repository open all files under
``index/`` are loaded via borgstore, decrypted, and merged into the in-memory PackIndex
(a ``borghash`` ``HashTableNT`` keyed on ``chunk_id``). The merge is commutative and
idempotent; order does not matter.
``borg compact`` consolidates all existing piece files into a single replacement file
that covers only live chunks, writes it to ``index/``, and removes the files it
supersedes. This keeps the namespace small and open-time merge cost bounded.
``borg compact`` rewrites the ``index/`` namespace: it identifies live chunks via
mark-and-sweep, consolidates the surviving mappings into medium-sized replacement
files (targeting roughly 10100 packs per file), and removes the files it supersedes.
Medium-sized files keep the open-time merge cost bounded while avoiding the
cache-invalidation traffic on other clients that a single all-in-one index would
cause.
If the entire ``index/`` namespace is lost or corrupt, the PackIndex can be rebuilt
by scanning pack files directly; see :ref:`pack-recovery`.
@ -247,43 +171,10 @@ Recovery Path
When ``borg check --repair`` detects a missing or incomplete PackIndex it rebuilds
it by forward-scanning all pack files in ``packs/``.
The 4-byte ``blob_len`` prefix before each blob makes the scan self-contained: no
prior knowledge of blob sizes or count is required. The algorithm for one pack file::
verify magic = first 8 bytes are b"BORGPACK"
verify version = byte 8 is 0x01
pos = 9
while pos < file_size:
if pos + 4 > file_size:
raise CorruptPackError(pack_id, pos)
blob_len = uint32le(file[pos : pos + 4])
if blob_len == 0 or pos + 4 + blob_len > file_size:
raise CorruptPackError(pack_id, pos)
obj_header = file[pos + 4 : pos + 4 + 24]
meta_size, data_size = uint32le(obj_header[0:4]), uint32le(obj_header[4:8])
# Verify ObjHeader integrity without the key.
assert xxh64(file[pos+28 : pos+28+meta_size]) == obj_header[8:16]
assert xxh64(file[pos+28+meta_size : pos+4+blob_len]) == obj_header[16:24]
# Reconstruct the chunk_id for this blob (requires the repository key).
chunk_id = derive_chunk_id(pack_id, pos + 4, blob_len)
record_index_entry(chunk_id,
pack_id = pack_id,
offset = pos + 4,
length = blob_len)
pos += 4 + blob_len
The ``offset`` recorded in the rebuilt index points past the ``blob_len`` prefix,
directly at the ObjHeader, consistent with normal PackIndex entries.
Reconstructing ``chunk_id`` values requires the repository key because the chunk ID
is a keyed MAC of the plaintext data (``id_hash(plaintext_data)``). Without the key,
a structural scan can still verify magic bytes, version, blob boundaries, and
ObjHeader xxh64 hashes, but cannot produce a usable ``chunk_id → location`` mapping.
Each blob's unencrypted header supplies the ``BORGPACK`` magic (for re-sync after
corruption), the ``chunk_id``, and the size fields needed to locate the next blob.
The scan produces a complete ``chunk_id → (pack_id, offset, length)`` mapping
without decrypting any blob and without the repository key.
.. _pack-repo-version:
@ -291,11 +182,11 @@ ObjHeader xxh64 hashes, but cannot produce a usable ``chunk_id → location`` ma
Repository Version and Feature Flags
--------------------------------------
Repositories using pack files require repository version **4**. Clients that only
Repositories using pack files require repository version **4**. Clients that only
accept version 3 refuse to open a version 4 repository with an unsupported-version
error before any data is read.
In addition, the manifest's ``config.feature_flags`` must include ``pack_files`` in
In addition, the repository ``config.feature_flags`` must include ``pack_files`` in
the mandatory set for all access modes:
.. code-block:: python
@ -310,9 +201,9 @@ the mandatory set for all access modes:
A client that does not recognise the ``pack_files`` feature flag will refuse to open
the repository with a ``MandatoryFeatureUnsupported`` error regardless of the version
number. The two guards cover different failure modes: the version bump stops clients
number. The two guards cover different failure modes: the version bump stops clients
that predate feature-flag support entirely; the feature flag gives a clearer error
message to clients that understand feature flags but don't know about packs yet.
There is no migration path from version 3 repositories to version 4. Users of the
There is no migration path from version 3 repositories to version 4. Users of the
version 3 beta format must create a new repository with ``borg repo-create``.