mirror of
https://github.com/borgbackup/borg.git
synced 2026-06-09 00:32:37 -04:00
document more internals, based on mailing list discussion
this should address #27, #28 and #29 at least at a basic level it is mostly based on the mailing list discussion mentionned in #27, with some reformatting and merging of different posts.
This commit is contained in:
parent
9f0ed2a8c0
commit
3f27c367fe
2 changed files with 107 additions and 0 deletions
|
|
@ -12,6 +12,7 @@
|
|||
.. _github: https://github.com/jborg/attic
|
||||
.. _OpenSSL: https://www.openssl.org/
|
||||
.. _Python: http://www.python.org/
|
||||
.. _Buzhash: https://en.wikipedia.org/wiki/Buzhash
|
||||
.. _PBKDF2: https://en.wikipedia.org/wiki/PBKDF2
|
||||
.. _SHA256: https://en.wikipedia.org/wiki/SHA-256
|
||||
.. _HMAC: https://en.wikipedia.org/wiki/HMAC
|
||||
|
|
@ -28,3 +29,4 @@
|
|||
.. _Arch Linux: https://aur.archlinux.org/packages/attic/
|
||||
.. _Slackware: http://slackbuilds.org/result/?search=Attic
|
||||
.. _Cython: http://cython.org/
|
||||
.. _mailing list discussion about internals: http://librelist.com/browser/attic/2014/5/6/questions-and-suggestions-about-inner-working-of-attic>
|
||||
|
|
@ -4,6 +4,111 @@
|
|||
Internals
|
||||
=========
|
||||
|
||||
This page documents the internal data structures and storage
|
||||
mechanisms of |project_name|. It is partly based on `mailing list
|
||||
discussion about internals`_ and also on static code analysis. It may
|
||||
not be exactly up to date with the current source code.
|
||||
|
||||
Indexes and memory usage
|
||||
------------------------
|
||||
|
||||
Repository index
|
||||
40 bytes x N ~ 200MB (If a remote repository is
|
||||
used this will be allocated on the remote side)
|
||||
|
||||
Chunk lookup index
|
||||
44 bytes x N ~ 220MB
|
||||
|
||||
File chunk cache
|
||||
probably 80-100 bytes x N ~ 400MB
|
||||
|
||||
The chunk lookup index (chunk hash -> reference count, size, ciphered
|
||||
size ; in file cache/chunk) and the repository index (chunk hash ->
|
||||
segment, offset ; in file repo/index.%d) are stored in a sort of hash
|
||||
table, directly mapped in memory from the file content, with only one
|
||||
slot per bucket, but that spreads the collisions to the following
|
||||
buckets. As a consequence the hash is just a start position for a linear
|
||||
search, and if the element is not in the table the index is linearly
|
||||
crossed until an empty bucket is found. When the table is full at 90%
|
||||
its size is doubled, when it's empty at 25% its size is halfed. So
|
||||
operations on it have a variable complexity between constant and linear
|
||||
with low factor, and memory overhead varies between 10% and 300%.
|
||||
|
||||
The file chunk cache (file path hash -> age, inode number, size,
|
||||
mtime_ns, chunks hashes ; in file cache/files) is stored as a python
|
||||
associative array storing python objects, which generate a lot of
|
||||
overhead. This takes around 240 bytes per file without the chunk
|
||||
list, to be compared to at most 64 bytes of real data (depending on data
|
||||
alignment), and around 80 bytes per chunk hash (vs 32), with a minimum
|
||||
of ~250 bytes even if only one chunck hash. The inode number is stored
|
||||
to make sure we distinguish between different files, as a single path
|
||||
may not be unique accross different archives in different setups.
|
||||
|
||||
Repository structure
|
||||
--------------------
|
||||
|
||||
|project_name| is a "filesystem based transactional key value store".
|
||||
|
||||
Objects referenced by a key (256bits id/hash) are stored in line in
|
||||
files (segments) of size approx 5MB in repo/data. They contain :
|
||||
header size, crc, size, tag, key, data. Tag is either ``PUT``,
|
||||
``DELETE``, or ``COMMIT``. Segments are built locally, and then
|
||||
uploaded.
|
||||
|
||||
A segment file is basically a transaction log where each repository
|
||||
operation is appended to the file. So if an object is written to the
|
||||
repository a ``PUT`` tag is written to the file followed by the object
|
||||
id and data. And if an object is deleted a ``DELETE`` tag is appended
|
||||
followed by the object id. A ``COMMIT`` tag is written when a
|
||||
repository transaction is committed. When a repository is opened any
|
||||
``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
|
||||
discarded since they are part of a partial/uncommitted transaction.
|
||||
|
||||
The manifest is an object with an id of only zeros (32 bytes), that
|
||||
references all the archives. It contains : version, list of archives,
|
||||
timestamp, config. Each archive contains: name, id, time. It is the last
|
||||
object stored, in the last segment, and is replaced each time.
|
||||
|
||||
The archive metadata does not contain the file items directly. Only
|
||||
references to other objects that contain that data. An archive is an
|
||||
object that contain metadata : version, name, items list, cmdline,
|
||||
hostname, username, time. Each item represents a file or directory or
|
||||
symlink is stored as a ``item`` dictionnary that contains: path, list
|
||||
of chunks, user, group, uid, gid, mode (item type + permissions),
|
||||
source (for links), rdev (for devices), mtime, xattrs, acl,
|
||||
bsdfiles. ``ctime`` (change time) is not stored because there is no
|
||||
API to set it and it is reset every time an inode's metadata is changed.
|
||||
|
||||
All items are serialized using msgpack and the resulting byte stream
|
||||
is fed into the same chunker used for regular file data and turned
|
||||
into deduplicated chunks. The reference to these chunks is then added
|
||||
to the archvive metadata. This allows the archive to store many files,
|
||||
beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB.
|
||||
|
||||
A chunk is an object as well, of course, and its id is the hash of its
|
||||
(unencrypted and uncompressed) content.
|
||||
|
||||
Hints are stored in a file (repo/hints) and contain: version, list of
|
||||
segments, compact.
|
||||
|
||||
Chunks
|
||||
------
|
||||
|
||||
|project_name| uses a rolling checksum with Buzhash_ algorithm, with
|
||||
window size of 4095 bytes, with a minimum of 1024, and triggers when
|
||||
the last 16 bits of the checksum are null, producing chunks of 64kB on
|
||||
average. All these parameters are fixed. The buzhash table is altered
|
||||
by XORing it with a seed randomly generated once for the archive, and
|
||||
stored encrypted in the keyfile.
|
||||
|
||||
Encryption
|
||||
----------
|
||||
|
||||
AES_ is used with CTR mode of operation (so no need of padding). A 64
|
||||
bits initialization vector is used, a SHA256_ based HMAC_ is computed
|
||||
on the encrypted chunk with a random 64 bits nonce and both are stored
|
||||
in the chunk. The header of each chunk is actually : TYPE(1) +
|
||||
HMAC(32) + NONCE(8). Encryption and HMAC use two different keys.
|
||||
|
||||
Key files
|
||||
---------
|
||||
|
|
|
|||
Loading…
Reference in a new issue