From 3f27c367fe644d2df8691e9cf532957ac6beedad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= Date: Tue, 16 Dec 2014 10:04:35 -0500 Subject: [PATCH] document more internals, based on mailing list discussion this should address #27, #28 and #29 at least at a basic level it is mostly based on the mailing list discussion mentionned in #27, with some reformatting and merging of different posts. --- docs/global.rst.inc | 2 + docs/internals.rst | 105 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 107 insertions(+) diff --git a/docs/global.rst.inc b/docs/global.rst.inc index a6236f60d..72d15126f 100644 --- a/docs/global.rst.inc +++ b/docs/global.rst.inc @@ -12,6 +12,7 @@ .. _github: https://github.com/jborg/attic .. _OpenSSL: https://www.openssl.org/ .. _Python: http://www.python.org/ +.. _Buzhash: https://en.wikipedia.org/wiki/Buzhash .. _PBKDF2: https://en.wikipedia.org/wiki/PBKDF2 .. _SHA256: https://en.wikipedia.org/wiki/SHA-256 .. _HMAC: https://en.wikipedia.org/wiki/HMAC @@ -28,3 +29,4 @@ .. _Arch Linux: https://aur.archlinux.org/packages/attic/ .. _Slackware: http://slackbuilds.org/result/?search=Attic .. _Cython: http://cython.org/ +.. _mailing list discussion about internals: http://librelist.com/browser/attic/2014/5/6/questions-and-suggestions-about-inner-working-of-attic> \ No newline at end of file diff --git a/docs/internals.rst b/docs/internals.rst index bdcf6aa09..94eef02fa 100644 --- a/docs/internals.rst +++ b/docs/internals.rst @@ -4,6 +4,111 @@ Internals ========= +This page documents the internal data structures and storage +mechanisms of |project_name|. It is partly based on `mailing list +discussion about internals`_ and also on static code analysis. It may +not be exactly up to date with the current source code. + +Indexes and memory usage +------------------------ + +Repository index + 40 bytes x N ~ 200MB (If a remote repository is + used this will be allocated on the remote side) + +Chunk lookup index + 44 bytes x N ~ 220MB + +File chunk cache + probably 80-100 bytes x N ~ 400MB + +The chunk lookup index (chunk hash -> reference count, size, ciphered +size ; in file cache/chunk) and the repository index (chunk hash -> +segment, offset ; in file repo/index.%d) are stored in a sort of hash +table, directly mapped in memory from the file content, with only one +slot per bucket, but that spreads the collisions to the following +buckets. As a consequence the hash is just a start position for a linear +search, and if the element is not in the table the index is linearly +crossed until an empty bucket is found. When the table is full at 90% +its size is doubled, when it's empty at 25% its size is halfed. So +operations on it have a variable complexity between constant and linear +with low factor, and memory overhead varies between 10% and 300%. + +The file chunk cache (file path hash -> age, inode number, size, +mtime_ns, chunks hashes ; in file cache/files) is stored as a python +associative array storing python objects, which generate a lot of +overhead. This takes around 240 bytes per file without the chunk +list, to be compared to at most 64 bytes of real data (depending on data +alignment), and around 80 bytes per chunk hash (vs 32), with a minimum +of ~250 bytes even if only one chunck hash. The inode number is stored +to make sure we distinguish between different files, as a single path +may not be unique accross different archives in different setups. + +Repository structure +-------------------- + +|project_name| is a "filesystem based transactional key value store". + +Objects referenced by a key (256bits id/hash) are stored in line in +files (segments) of size approx 5MB in repo/data. They contain : +header size, crc, size, tag, key, data. Tag is either ``PUT``, +``DELETE``, or ``COMMIT``. Segments are built locally, and then +uploaded. + +A segment file is basically a transaction log where each repository +operation is appended to the file. So if an object is written to the +repository a ``PUT`` tag is written to the file followed by the object +id and data. And if an object is deleted a ``DELETE`` tag is appended +followed by the object id. A ``COMMIT`` tag is written when a +repository transaction is committed. When a repository is opened any +``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are +discarded since they are part of a partial/uncommitted transaction. + +The manifest is an object with an id of only zeros (32 bytes), that +references all the archives. It contains : version, list of archives, +timestamp, config. Each archive contains: name, id, time. It is the last +object stored, in the last segment, and is replaced each time. + +The archive metadata does not contain the file items directly. Only +references to other objects that contain that data. An archive is an +object that contain metadata : version, name, items list, cmdline, +hostname, username, time. Each item represents a file or directory or +symlink is stored as a ``item`` dictionnary that contains: path, list +of chunks, user, group, uid, gid, mode (item type + permissions), +source (for links), rdev (for devices), mtime, xattrs, acl, +bsdfiles. ``ctime`` (change time) is not stored because there is no +API to set it and it is reset every time an inode's metadata is changed. + +All items are serialized using msgpack and the resulting byte stream +is fed into the same chunker used for regular file data and turned +into deduplicated chunks. The reference to these chunks is then added +to the archvive metadata. This allows the archive to store many files, +beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB. + +A chunk is an object as well, of course, and its id is the hash of its +(unencrypted and uncompressed) content. + +Hints are stored in a file (repo/hints) and contain: version, list of +segments, compact. + +Chunks +------ + +|project_name| uses a rolling checksum with Buzhash_ algorithm, with +window size of 4095 bytes, with a minimum of 1024, and triggers when +the last 16 bits of the checksum are null, producing chunks of 64kB on +average. All these parameters are fixed. The buzhash table is altered +by XORing it with a seed randomly generated once for the archive, and +stored encrypted in the keyfile. + +Encryption +---------- + +AES_ is used with CTR mode of operation (so no need of padding). A 64 +bits initialization vector is used, a SHA256_ based HMAC_ is computed +on the encrypted chunk with a random 64 bits nonce and both are stored +in the chunk. The header of each chunk is actually : TYPE(1) + +HMAC(32) + NONCE(8). Encryption and HMAC use two different keys. Key files ---------