mirror of
https://github.com/borgbackup/borg.git
synced 2026-06-09 00:32:37 -04:00
Merge branch 'master' into multithreading
This commit is contained in:
commit
4736f5b9d0
24 changed files with 915 additions and 340 deletions
3
AUTHORS
3
AUTHORS
|
|
@ -1,6 +1,7 @@
|
|||
Borg Developers / Contributors ("The Borg Collective")
|
||||
``````````````````````````````````````````````````````
|
||||
- Thomas Waldmann
|
||||
- Thomas Waldmann <tw@waldmann-edv.de>
|
||||
- Antoine Beaupré
|
||||
|
||||
|
||||
Borg is a fork of Attic. Attic is written and maintained
|
||||
|
|
|
|||
127
CHANGES
127
CHANGES
|
|
@ -1,56 +1,108 @@
|
|||
Borg Changelog
|
||||
==============
|
||||
|
||||
Version <TBD>
|
||||
-------------
|
||||
|
||||
Version 0.24.0
|
||||
--------------
|
||||
|
||||
New features:
|
||||
|
||||
- borg create --chunker-params ... to configure the chunker.
|
||||
See docs/misc/create_chunker-params.txt for more information.
|
||||
- borg info now reports chunk counts in the chunk index.
|
||||
|
||||
Bug fixes:
|
||||
|
||||
- reduce memory usage, see --chunker-params, fixes #16.
|
||||
This can be used to reduce chunk management overhead, so borg does not create
|
||||
a huge chunks index/repo index and eats all your RAM if you back up lots of
|
||||
data in huge files (like VM disk images).
|
||||
- better Exception msg if there is no Borg installed on the remote repo server.
|
||||
|
||||
Other changes:
|
||||
|
||||
- Fedora/Fedora-based install instructions added to docs.
|
||||
- added docs/misc directory for misc. writeups that won't be included "as is"
|
||||
into the html docs.
|
||||
|
||||
|
||||
I forgot to list some stuff already implemented in 0.23.0, here they are:
|
||||
|
||||
New features:
|
||||
|
||||
- efficient archive list from manifest, meaning a big speedup for slow
|
||||
repo connections and "list <repo>", "delete <repo>", "prune"
|
||||
- big speedup for chunks cache sync (esp. for slow repo connections), fixes #18
|
||||
- hashindex: improve error messages
|
||||
|
||||
Other changes:
|
||||
|
||||
- explicitly specify binary mode to open binary files
|
||||
- some easy micro optimizations
|
||||
|
||||
|
||||
Version 0.23.0
|
||||
--------------
|
||||
|
||||
Incompatible changes (compared to attic, fork related):
|
||||
|
||||
- changed sw name and cli command to "borg", updated docs
|
||||
- package name and name in urls uses "borgbackup" to have less collisions
|
||||
- package name (and name in urls) uses "borgbackup" to have less collisions
|
||||
- changed repo / cache internal magic strings from ATTIC* to BORG*,
|
||||
changed cache location to .cache/borg/
|
||||
- give specific path to xattr.is_enabled(), disable symlink setattr call that
|
||||
always fails
|
||||
- fix misleading hint the fuse ImportError handler gave, fixes attic #237
|
||||
- source: misc. cleanups, pep8, style
|
||||
- implement check --last N
|
||||
- check: sort archives in reverse time order
|
||||
changed cache location to .cache/borg/ - this means that it currently won't
|
||||
accept attic repos (see issue #21 about improving that)
|
||||
|
||||
Bug fixes:
|
||||
|
||||
- avoid defect python-msgpack releases, fixes attic #171, fixes attic #185
|
||||
- check unpacked data from RPC for tuple type and correct length, fixes attic #127
|
||||
- less memory usage: add global option --no-cache-files
|
||||
- fix traceback when trying to do unsupported passphrase change, fixes attic #189
|
||||
- datetime does not like the year 10.000, fixes attic #139
|
||||
- docs and faq improvements, fixes, updates
|
||||
- cleanup crypto.pyx, make it easier to adapt to other modes
|
||||
- extract: if --stdout is given, write all extracted binary data to stdout
|
||||
- fix "info" all archives stats, fixes attic #183
|
||||
- fix parsing with missing microseconds, fixes attic #282
|
||||
- fix misleading hint the fuse ImportError handler gave, fixes attic #237
|
||||
- check unpacked data from RPC for tuple type and correct length, fixes attic #127
|
||||
- fix Repository._active_txn state when lock upgrade fails
|
||||
- give specific path to xattr.is_enabled(), disable symlink setattr call that
|
||||
always fails
|
||||
- fix test setup for 32bit platforms, partial fix for attic #196
|
||||
- upgraded versioneer, PEP440 compliance, fixes attic #257
|
||||
|
||||
New features:
|
||||
|
||||
- less memory usage: add global option --no-cache-files
|
||||
- check --last N (only check the last N archives)
|
||||
- check: sort archives in reverse time order
|
||||
- rename repo::oldname newname (rename repository)
|
||||
- create -v output more informative
|
||||
- create --progress (backup progress indicator)
|
||||
- create --timestamp (utc string or reference file/dir)
|
||||
- create: if "-" is given as path, read binary from stdin
|
||||
- do os.fsync like recommended in the python docs
|
||||
- extract: if --stdout is given, write all extracted binary data to stdout
|
||||
- extract --sparse (simple sparse file support)
|
||||
- extra debug information for 'fread failed'
|
||||
- delete <repo> (deletes whole repo + local cache)
|
||||
- FUSE: reflect deduplication in allocated blocks
|
||||
- only allow whitelisted RPC calls in server mode
|
||||
- normalize source/exclude paths before matching
|
||||
- fix "info" all archives stats, fixes attic #183
|
||||
- implement create --timestamp, utc string or reference file/dir
|
||||
- simple sparse file support (extract --sparse)
|
||||
- fix parsing with missing microseconds, fixes attic #282
|
||||
- use posix_fadvise to not spoil the OS cache, fixes attic #252
|
||||
- source: Let chunker optionally work with os-level file descriptor.
|
||||
- source: Linux: remove duplicate os.fsencode calls
|
||||
- fix test setup for 32bit platforms, partial fix for attic #196
|
||||
- source: refactor _open_rb code a bit, so it is more consistent / regular
|
||||
- implement rename repo::oldname newname
|
||||
- implement create --progress
|
||||
- source: refactor indicator (status) and item processing
|
||||
- implement delete repo (also deletes local cache)
|
||||
- better create -v output
|
||||
- upgraded versioneer, PEP440 compliance, fixes attic #257
|
||||
- source: use py.test for better testing, flake8 for code style checks
|
||||
- source: fix tox >=2.0 compatibility
|
||||
- toplevel error handler: show tracebacks for better error analysis
|
||||
- sigusr1 / sigint handler to print current file infos - attic PR #286
|
||||
- pypi package: add python version classifiers, add FreeBSD to platforms
|
||||
- fix Repository._active_txn state when lock upgrade fails
|
||||
- RPCError: include the exception args we get from remote
|
||||
|
||||
Other changes:
|
||||
|
||||
- source: misc. cleanups, pep8, style
|
||||
- docs and faq improvements, fixes, updates
|
||||
- cleanup crypto.pyx, make it easier to adapt to other AES modes
|
||||
- do os.fsync like recommended in the python docs
|
||||
- source: Let chunker optionally work with os-level file descriptor.
|
||||
- source: Linux: remove duplicate os.fsencode calls
|
||||
- source: refactor _open_rb code a bit, so it is more consistent / regular
|
||||
- source: refactor indicator (status) and item processing
|
||||
- source: use py.test for better testing, flake8 for code style checks
|
||||
- source: fix tox >=2.0 compatibility (test runner)
|
||||
- pypi package: add python version classifiers, add FreeBSD to platforms
|
||||
|
||||
|
||||
Attic Changelog
|
||||
===============
|
||||
|
|
@ -58,6 +110,13 @@ Attic Changelog
|
|||
Here you can see the full list of changes between each Attic release until Borg
|
||||
forked from Attic:
|
||||
|
||||
Version 0.17
|
||||
------------
|
||||
|
||||
(bugfix release, released on X)
|
||||
- Fix hashindex ARM memory alignment issue (#309)
|
||||
- Improve hashindex error messages (#298)
|
||||
|
||||
Version 0.16
|
||||
------------
|
||||
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
include README.rst LICENSE CHANGES MANIFEST.in versioneer.py
|
||||
include README.rst AUTHORS LICENSE CHANGES MANIFEST.in versioneer.py
|
||||
recursive-include borg *.pyx
|
||||
recursive-include docs *
|
||||
recursive-exclude docs *.pyc
|
||||
|
|
|
|||
10
README.rst
10
README.rst
|
|
@ -10,8 +10,12 @@ are stored.
|
|||
Borg is a fork of Attic and maintained by "The Borg Collective" (see AUTHORS file).
|
||||
|
||||
BORG IS NOT COMPATIBLE WITH ORIGINAL ATTIC.
|
||||
UNTIL FURTHER NOTICE, EXPECT THAT WE WILL BREAK COMPATIBILITY REPEATEDLY.
|
||||
THIS IS SOFTWARE IN DEVELOPMENT, DECIDE YOURSELF IF IT FITS YOUR NEEDS.
|
||||
EXPECT THAT WE WILL BREAK COMPATIBILITY REPEATEDLY WHEN MAJOR RELEASE NUMBER
|
||||
CHANGES (like when going from 0.x.y to 1.0.0). Please read CHANGES document.
|
||||
|
||||
NOT RELEASED DEVELOPMENT VERSIONS HAVE UNKNOWN COMPATIBILITY PROPERTIES.
|
||||
|
||||
THIS IS SOFTWARE IN DEVELOPMENT, DECIDE YOURSELF WHETHER IT FITS YOUR NEEDS.
|
||||
|
||||
Read issue #1 on the issue tracker, goals are being defined there.
|
||||
|
||||
|
|
@ -66,7 +70,7 @@ Where are the tests?
|
|||
The tests are in the borg/testsuite package. To run the test suite use the
|
||||
following command::
|
||||
|
||||
$ fakeroot -u tox # you need to have tox installed
|
||||
$ fakeroot -u tox # you need to have tox and pytest installed
|
||||
|
||||
.. |build| image:: https://travis-ci.org/borgbackup/borg.svg
|
||||
:alt: Build Status
|
||||
|
|
|
|||
|
|
@ -79,7 +79,7 @@ typedef struct {
|
|||
int window_size, chunk_mask, min_size;
|
||||
size_t buf_size;
|
||||
uint32_t *table;
|
||||
uint8_t *data, *read_buf;
|
||||
uint8_t *data;
|
||||
PyObject *fd;
|
||||
int fh;
|
||||
int done, eof;
|
||||
|
|
@ -96,7 +96,6 @@ chunker_init(int window_size, int chunk_mask, int min_size, int max_size, uint32
|
|||
c->table = buzhash_init_table(seed);
|
||||
c->buf_size = max_size;
|
||||
c->data = malloc(c->buf_size);
|
||||
c->read_buf = malloc(c->buf_size);
|
||||
return c;
|
||||
}
|
||||
|
||||
|
|
@ -122,7 +121,6 @@ chunker_free(Chunker *c)
|
|||
Py_XDECREF(c->fd);
|
||||
free(c->table);
|
||||
free(c->data);
|
||||
free(c->read_buf);
|
||||
free(c);
|
||||
}
|
||||
|
||||
|
|
@ -140,9 +138,8 @@ chunker_fill(Chunker *c)
|
|||
}
|
||||
if(c->fh >= 0) {
|
||||
// if we have a os-level file descriptor, use os-level API
|
||||
n = read(c->fh, c->read_buf, n);
|
||||
n = read(c->fh, c->data + c->position + c->remaining, n);
|
||||
if(n > 0) {
|
||||
memcpy(c->data + c->position + c->remaining, c->read_buf, n);
|
||||
c->remaining += n;
|
||||
c->bytes_read += n;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -18,8 +18,11 @@
|
|||
#error Unknown byte order
|
||||
#endif
|
||||
|
||||
#define MAGIC "BORG_IDX"
|
||||
#define MAGIC_LEN 8
|
||||
|
||||
typedef struct {
|
||||
char magic[8];
|
||||
char magic[MAGIC_LEN];
|
||||
int32_t num_entries;
|
||||
int32_t num_buckets;
|
||||
int8_t key_size;
|
||||
|
|
@ -27,7 +30,6 @@ typedef struct {
|
|||
} __attribute__((__packed__)) HashHeader;
|
||||
|
||||
typedef struct {
|
||||
void *data;
|
||||
void *buckets;
|
||||
int num_entries;
|
||||
int num_buckets;
|
||||
|
|
@ -36,10 +38,8 @@ typedef struct {
|
|||
off_t bucket_size;
|
||||
int lower_limit;
|
||||
int upper_limit;
|
||||
off_t data_len;
|
||||
} HashIndex;
|
||||
|
||||
#define MAGIC "BORG_IDX"
|
||||
#define EMPTY _htole32(0xffffffff)
|
||||
#define DELETED _htole32(0xfffffffe)
|
||||
#define MAX_BUCKET_SIZE 512
|
||||
|
|
@ -57,8 +57,10 @@ typedef struct {
|
|||
#define BUCKET_MARK_DELETED(index, idx) (*((uint32_t *)(BUCKET_ADDR(index, idx) + index->key_size)) = DELETED)
|
||||
#define BUCKET_MARK_EMPTY(index, idx) (*((uint32_t *)(BUCKET_ADDR(index, idx) + index->key_size)) = EMPTY)
|
||||
|
||||
#define EPRINTF(msg, ...) fprintf(stderr, "hashindex: " msg "\n", ##__VA_ARGS__)
|
||||
#define EPRINTF_PATH(path, msg, ...) fprintf(stderr, "hashindex: %s: " msg "\n", path, ##__VA_ARGS__)
|
||||
#define EPRINTF_MSG(msg, ...) fprintf(stderr, "hashindex: " msg "\n", ##__VA_ARGS__)
|
||||
#define EPRINTF_MSG_PATH(path, msg, ...) fprintf(stderr, "hashindex: %s: " msg "\n", path, ##__VA_ARGS__)
|
||||
#define EPRINTF(msg, ...) fprintf(stderr, "hashindex: " msg "(%s)\n", ##__VA_ARGS__, strerror(errno))
|
||||
#define EPRINTF_PATH(path, msg, ...) fprintf(stderr, "hashindex: %s: " msg " (%s)\n", path, ##__VA_ARGS__, strerror(errno))
|
||||
|
||||
static HashIndex *hashindex_read(const char *path);
|
||||
static int hashindex_write(HashIndex *index, const char *path);
|
||||
|
|
@ -118,13 +120,11 @@ hashindex_resize(HashIndex *index, int capacity)
|
|||
while((key = hashindex_next_key(index, key))) {
|
||||
hashindex_set(new, key, hashindex_get(index, key));
|
||||
}
|
||||
free(index->data);
|
||||
index->data = new->data;
|
||||
index->data_len = new->data_len;
|
||||
free(index->buckets);
|
||||
index->buckets = new->buckets;
|
||||
index->num_buckets = new->num_buckets;
|
||||
index->lower_limit = new->lower_limit;
|
||||
index->upper_limit = new->upper_limit;
|
||||
index->buckets = new->buckets;
|
||||
free(new);
|
||||
return 1;
|
||||
}
|
||||
|
|
@ -134,18 +134,22 @@ static HashIndex *
|
|||
hashindex_read(const char *path)
|
||||
{
|
||||
FILE *fd;
|
||||
off_t length;
|
||||
off_t bytes_read;
|
||||
off_t length, buckets_length, bytes_read;
|
||||
HashHeader header;
|
||||
HashIndex *index = NULL;
|
||||
|
||||
if((fd = fopen(path, "r")) == NULL) {
|
||||
EPRINTF_PATH(path, "fopen failed");
|
||||
if((fd = fopen(path, "rb")) == NULL) {
|
||||
EPRINTF_PATH(path, "fopen for reading failed");
|
||||
return NULL;
|
||||
}
|
||||
bytes_read = fread(&header, 1, sizeof(HashHeader), fd);
|
||||
if(bytes_read != sizeof(HashHeader)) {
|
||||
EPRINTF_PATH(path, "fread header failed (expected %ld, got %ld)", sizeof(HashHeader), bytes_read);
|
||||
if(ferror(fd)) {
|
||||
EPRINTF_PATH(path, "fread header failed (expected %ld, got %ld)", sizeof(HashHeader), bytes_read);
|
||||
}
|
||||
else {
|
||||
EPRINTF_MSG_PATH(path, "fread header failed (expected %ld, got %ld)", sizeof(HashHeader), bytes_read);
|
||||
}
|
||||
goto fail;
|
||||
}
|
||||
if(fseek(fd, 0, SEEK_END) < 0) {
|
||||
|
|
@ -156,43 +160,47 @@ hashindex_read(const char *path)
|
|||
EPRINTF_PATH(path, "ftell failed");
|
||||
goto fail;
|
||||
}
|
||||
if(fseek(fd, 0, SEEK_SET) < 0) {
|
||||
if(fseek(fd, sizeof(HashHeader), SEEK_SET) < 0) {
|
||||
EPRINTF_PATH(path, "fseek failed");
|
||||
goto fail;
|
||||
}
|
||||
if(memcmp(header.magic, MAGIC, 8)) {
|
||||
EPRINTF_PATH(path, "Unknown file header");
|
||||
if(memcmp(header.magic, MAGIC, MAGIC_LEN)) {
|
||||
EPRINTF_MSG_PATH(path, "Unknown MAGIC in header");
|
||||
goto fail;
|
||||
}
|
||||
if(length != sizeof(HashHeader) + (off_t)_le32toh(header.num_buckets) * (header.key_size + header.value_size)) {
|
||||
EPRINTF_PATH(path, "Incorrect file length");
|
||||
buckets_length = (off_t)_le32toh(header.num_buckets) * (header.key_size + header.value_size);
|
||||
if(length != sizeof(HashHeader) + buckets_length) {
|
||||
EPRINTF_MSG_PATH(path, "Incorrect file length (expected %ld, got %ld)", sizeof(HashHeader) + buckets_length, length);
|
||||
goto fail;
|
||||
}
|
||||
if(!(index = malloc(sizeof(HashIndex)))) {
|
||||
EPRINTF_PATH(path, "malloc failed");
|
||||
EPRINTF_PATH(path, "malloc header failed");
|
||||
goto fail;
|
||||
}
|
||||
if(!(index->data = malloc(length))) {
|
||||
EPRINTF_PATH(path, "malloc failed");
|
||||
if(!(index->buckets = malloc(buckets_length))) {
|
||||
EPRINTF_PATH(path, "malloc buckets failed");
|
||||
free(index);
|
||||
index = NULL;
|
||||
goto fail;
|
||||
}
|
||||
bytes_read = fread(index->data, 1, length, fd);
|
||||
if(bytes_read != length) {
|
||||
EPRINTF_PATH(path, "fread hashindex failed (expected %ld, got %ld)", length, bytes_read);
|
||||
free(index->data);
|
||||
bytes_read = fread(index->buckets, 1, buckets_length, fd);
|
||||
if(bytes_read != buckets_length) {
|
||||
if(ferror(fd)) {
|
||||
EPRINTF_PATH(path, "fread buckets failed (expected %ld, got %ld)", buckets_length, bytes_read);
|
||||
}
|
||||
else {
|
||||
EPRINTF_MSG_PATH(path, "fread buckets failed (expected %ld, got %ld)", buckets_length, bytes_read);
|
||||
}
|
||||
free(index->buckets);
|
||||
free(index);
|
||||
index = NULL;
|
||||
goto fail;
|
||||
}
|
||||
index->data_len = length;
|
||||
index->num_entries = _le32toh(header.num_entries);
|
||||
index->num_buckets = _le32toh(header.num_buckets);
|
||||
index->key_size = header.key_size;
|
||||
index->value_size = header.value_size;
|
||||
index->bucket_size = index->key_size + index->value_size;
|
||||
index->buckets = index->data + sizeof(HashHeader);
|
||||
index->lower_limit = index->num_buckets > MIN_BUCKETS ? ((int)(index->num_buckets * BUCKET_LOWER_LIMIT)) : 0;
|
||||
index->upper_limit = (int)(index->num_buckets * BUCKET_UPPER_LIMIT);
|
||||
fail:
|
||||
|
|
@ -205,20 +213,18 @@ fail:
|
|||
static HashIndex *
|
||||
hashindex_init(int capacity, int key_size, int value_size)
|
||||
{
|
||||
off_t buckets_length;
|
||||
HashIndex *index;
|
||||
HashHeader header = {
|
||||
.magic = MAGIC, .num_entries = 0, .key_size = key_size, .value_size = value_size
|
||||
};
|
||||
int i;
|
||||
capacity = MAX(MIN_BUCKETS, capacity);
|
||||
|
||||
if(!(index = malloc(sizeof(HashIndex)))) {
|
||||
EPRINTF("malloc failed");
|
||||
EPRINTF("malloc header failed");
|
||||
return NULL;
|
||||
}
|
||||
index->data_len = sizeof(HashHeader) + (off_t)capacity * (key_size + value_size);
|
||||
if(!(index->data = calloc(index->data_len, 1))) {
|
||||
EPRINTF("malloc failed");
|
||||
buckets_length = (off_t)capacity * (key_size + value_size);
|
||||
if(!(index->buckets = calloc(buckets_length, 1))) {
|
||||
EPRINTF("malloc buckets failed");
|
||||
free(index);
|
||||
return NULL;
|
||||
}
|
||||
|
|
@ -229,8 +235,6 @@ hashindex_init(int capacity, int key_size, int value_size)
|
|||
index->bucket_size = index->key_size + index->value_size;
|
||||
index->lower_limit = index->num_buckets > MIN_BUCKETS ? ((int)(index->num_buckets * BUCKET_LOWER_LIMIT)) : 0;
|
||||
index->upper_limit = (int)(index->num_buckets * BUCKET_UPPER_LIMIT);
|
||||
index->buckets = index->data + sizeof(HashHeader);
|
||||
memcpy(index->data, &header, sizeof(HashHeader));
|
||||
for(i = 0; i < capacity; i++) {
|
||||
BUCKET_MARK_EMPTY(index, i);
|
||||
}
|
||||
|
|
@ -240,25 +244,34 @@ hashindex_init(int capacity, int key_size, int value_size)
|
|||
static void
|
||||
hashindex_free(HashIndex *index)
|
||||
{
|
||||
free(index->data);
|
||||
free(index->buckets);
|
||||
free(index);
|
||||
}
|
||||
|
||||
static int
|
||||
hashindex_write(HashIndex *index, const char *path)
|
||||
{
|
||||
off_t buckets_length = (off_t)index->num_buckets * index->bucket_size;
|
||||
FILE *fd;
|
||||
HashHeader header = {
|
||||
.magic = MAGIC,
|
||||
.num_entries = _htole32(index->num_entries),
|
||||
.num_buckets = _htole32(index->num_buckets),
|
||||
.key_size = index->key_size,
|
||||
.value_size = index->value_size
|
||||
};
|
||||
int ret = 1;
|
||||
|
||||
if((fd = fopen(path, "w")) == NULL) {
|
||||
EPRINTF_PATH(path, "open failed");
|
||||
fprintf(stderr, "Failed to open %s for writing\n", path);
|
||||
if((fd = fopen(path, "wb")) == NULL) {
|
||||
EPRINTF_PATH(path, "fopen for writing failed");
|
||||
return 0;
|
||||
}
|
||||
*((uint32_t *)(index->data + 8)) = _htole32(index->num_entries);
|
||||
*((uint32_t *)(index->data + 12)) = _htole32(index->num_buckets);
|
||||
if(fwrite(index->data, 1, index->data_len, fd) != index->data_len) {
|
||||
EPRINTF_PATH(path, "fwrite failed");
|
||||
if(fwrite(&header, 1, sizeof(header), fd) != sizeof(header)) {
|
||||
EPRINTF_PATH(path, "fwrite header failed");
|
||||
ret = 0;
|
||||
}
|
||||
if(fwrite(index->buckets, 1, buckets_length, fd) != buckets_length) {
|
||||
EPRINTF_PATH(path, "fwrite buckets failed");
|
||||
ret = 0;
|
||||
}
|
||||
if(fclose(fd) < 0) {
|
||||
|
|
@ -348,14 +361,18 @@ hashindex_get_size(HashIndex *index)
|
|||
}
|
||||
|
||||
static void
|
||||
hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize, long long *total_unique_size, long long *total_unique_csize)
|
||||
hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize,
|
||||
long long *total_unique_size, long long *total_unique_csize,
|
||||
long long *total_unique_chunks, long long *total_chunks)
|
||||
{
|
||||
int64_t size = 0, csize = 0, unique_size = 0, unique_csize = 0;
|
||||
int64_t size = 0, csize = 0, unique_size = 0, unique_csize = 0, chunks = 0, unique_chunks = 0;
|
||||
const int32_t *values;
|
||||
void *key = NULL;
|
||||
|
||||
while((key = hashindex_next_key(index, key))) {
|
||||
values = key + 32;
|
||||
values = key + index->key_size;
|
||||
unique_chunks++;
|
||||
chunks += values[0];
|
||||
unique_size += values[1];
|
||||
unique_csize += values[2];
|
||||
size += values[0] * values[1];
|
||||
|
|
@ -365,5 +382,6 @@ hashindex_summarize(HashIndex *index, long long *total_size, long long *total_cs
|
|||
*total_csize = csize;
|
||||
*total_unique_size = unique_size;
|
||||
*total_unique_csize = unique_csize;
|
||||
*total_unique_chunks = unique_chunks;
|
||||
*total_chunks = chunks;
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -24,12 +24,14 @@ from .helpers import parse_timestamp, Error, uid2user, user2uid, gid2group, grou
|
|||
make_queue, TerminatedQueue
|
||||
|
||||
ITEMS_BUFFER = 1024 * 1024
|
||||
CHUNK_MIN = 1024
|
||||
CHUNK_MAX = 10 * 1024 * 1024
|
||||
WINDOW_SIZE = 0xfff
|
||||
CHUNK_MASK = 0xffff
|
||||
|
||||
ZEROS = b'\0' * CHUNK_MAX
|
||||
CHUNK_MIN_EXP = 10 # 2**10 == 1kiB
|
||||
CHUNK_MAX_EXP = 23 # 2**23 == 8MiB
|
||||
HASH_WINDOW_SIZE = 0xfff # 4095B
|
||||
HASH_MASK_BITS = 16 # results in ~64kiB chunks statistically
|
||||
|
||||
# defaults, use --chunker-params to override
|
||||
CHUNKER_PARAMS = (CHUNK_MIN_EXP, CHUNK_MAX_EXP, HASH_MASK_BITS, HASH_WINDOW_SIZE)
|
||||
|
||||
utime_supports_fd = os.utime in getattr(os, 'supports_fd', {})
|
||||
utime_supports_follow_symlinks = os.utime in getattr(os, 'supports_follow_symlinks', {})
|
||||
|
|
@ -72,12 +74,12 @@ class DownloadPipeline:
|
|||
class ChunkBuffer:
|
||||
BUFFER_SIZE = 1 * 1024 * 1024
|
||||
|
||||
def __init__(self, key):
|
||||
def __init__(self, key, chunker_params=CHUNKER_PARAMS):
|
||||
self.buffer = BytesIO()
|
||||
self.packer = msgpack.Packer(unicode_errors='surrogateescape')
|
||||
self.chunks = []
|
||||
self.key = key
|
||||
self.chunker = Chunker(WINDOW_SIZE, CHUNK_MASK, CHUNK_MIN, CHUNK_MAX,self.key.chunk_seed)
|
||||
self.chunker = Chunker(self.key.chunk_seed, *chunker_params)
|
||||
|
||||
def add(self, item):
|
||||
self.buffer.write(self.packer.pack(StableDict(item)))
|
||||
|
|
@ -107,8 +109,8 @@ class ChunkBuffer:
|
|||
|
||||
class CacheChunkBuffer(ChunkBuffer):
|
||||
|
||||
def __init__(self, cache, key, stats):
|
||||
super(CacheChunkBuffer, self).__init__(key)
|
||||
def __init__(self, cache, key, stats, chunker_params=CHUNKER_PARAMS):
|
||||
super(CacheChunkBuffer, self).__init__(key, chunker_params)
|
||||
self.cache = cache
|
||||
self.stats = stats
|
||||
|
||||
|
|
@ -317,7 +319,8 @@ class Archive:
|
|||
|
||||
|
||||
def __init__(self, repository, key, manifest, name, cache=None, create=False,
|
||||
checkpoint_interval=300, numeric_owner=False, progress=False):
|
||||
checkpoint_interval=300, numeric_owner=False, progress=False,
|
||||
chunker_params=CHUNKER_PARAMS):
|
||||
self.cwd = os.getcwd()
|
||||
self.key = key
|
||||
self.repository = repository
|
||||
|
|
@ -333,8 +336,8 @@ class Archive:
|
|||
self.pipeline = DownloadPipeline(self.repository, self.key)
|
||||
if create:
|
||||
self.pp = ParallelProcessor(self)
|
||||
self.items_buffer = CacheChunkBuffer(self.cache, self.key, self.stats)
|
||||
self.chunker = Chunker(WINDOW_SIZE, CHUNK_MASK, CHUNK_MIN, CHUNK_MAX, self.key.chunk_seed)
|
||||
self.items_buffer = CacheChunkBuffer(self.cache, self.key, self.stats, chunker_params)
|
||||
self.chunker = Chunker(self.key.chunk_seed, *chunker_params)
|
||||
if name in manifest.archives:
|
||||
raise self.AlreadyExists(name)
|
||||
self.last_checkpoint = time.time()
|
||||
|
|
@ -350,6 +353,7 @@ class Archive:
|
|||
raise self.DoesNotExist(name)
|
||||
info = self.manifest.archives[name]
|
||||
self.load(info[b'id'])
|
||||
self.zeros = b'\0' * (1 << chunker_params[1])
|
||||
|
||||
def close(self):
|
||||
if self.pp:
|
||||
|
|
@ -475,12 +479,7 @@ class Archive:
|
|||
except OSError:
|
||||
pass
|
||||
mode = item[b'mode']
|
||||
if stat.S_ISDIR(mode):
|
||||
if not os.path.exists(path):
|
||||
os.makedirs(path)
|
||||
if restore_attrs:
|
||||
self.restore_attrs(path, item)
|
||||
elif stat.S_ISREG(mode):
|
||||
if stat.S_ISREG(mode):
|
||||
if not os.path.exists(os.path.dirname(path)):
|
||||
os.makedirs(os.path.dirname(path))
|
||||
# Hard link?
|
||||
|
|
@ -501,7 +500,7 @@ class Archive:
|
|||
with open(path, 'wb') as fd:
|
||||
ids = [c[0] for c in item[b'chunks']]
|
||||
for data in self.pipeline.fetch_many(ids, is_preloaded=True):
|
||||
if sparse and ZEROS.startswith(data):
|
||||
if sparse and self.zeros.startswith(data):
|
||||
# all-zero chunk: create a hole in a sparse file
|
||||
fd.seek(len(data), 1)
|
||||
else:
|
||||
|
|
@ -510,11 +509,11 @@ class Archive:
|
|||
fd.truncate(pos)
|
||||
fd.flush()
|
||||
self.restore_attrs(path, item, fd=fd.fileno())
|
||||
elif stat.S_ISFIFO(mode):
|
||||
if not os.path.exists(os.path.dirname(path)):
|
||||
os.makedirs(os.path.dirname(path))
|
||||
os.mkfifo(path)
|
||||
self.restore_attrs(path, item)
|
||||
elif stat.S_ISDIR(mode):
|
||||
if not os.path.exists(path):
|
||||
os.makedirs(path)
|
||||
if restore_attrs:
|
||||
self.restore_attrs(path, item)
|
||||
elif stat.S_ISLNK(mode):
|
||||
if not os.path.exists(os.path.dirname(path)):
|
||||
os.makedirs(os.path.dirname(path))
|
||||
|
|
@ -523,6 +522,11 @@ class Archive:
|
|||
os.unlink(path)
|
||||
os.symlink(source, path)
|
||||
self.restore_attrs(path, item, symlink=True)
|
||||
elif stat.S_ISFIFO(mode):
|
||||
if not os.path.exists(os.path.dirname(path)):
|
||||
os.makedirs(os.path.dirname(path))
|
||||
os.mkfifo(path)
|
||||
self.restore_attrs(path, item)
|
||||
elif stat.S_ISCHR(mode) or stat.S_ISBLK(mode):
|
||||
os.mknod(path, item[b'mode'], item[b'rdev'])
|
||||
self.restore_attrs(path, item)
|
||||
|
|
@ -700,6 +704,7 @@ class Archive:
|
|||
|
||||
@staticmethod
|
||||
def list_archives(repository, key, manifest, cache=None):
|
||||
# expensive! see also Manifest.list_archive_infos.
|
||||
for name, info in manifest.archives.items():
|
||||
yield Archive(repository, key, manifest, name, cache=cache)
|
||||
|
||||
|
|
|
|||
|
|
@ -13,7 +13,7 @@ import textwrap
|
|||
import traceback
|
||||
|
||||
from . import __version__
|
||||
from .archive import Archive, ArchiveChecker
|
||||
from .archive import Archive, ArchiveChecker, CHUNKER_PARAMS
|
||||
from .repository import Repository
|
||||
from .cache import Cache
|
||||
from .key import key_creator
|
||||
|
|
@ -21,7 +21,7 @@ from .helpers import Error, location_validator, format_time, format_file_size, \
|
|||
format_file_mode, ExcludePattern, exclude_path, adjust_patterns, to_localtime, timestamp, \
|
||||
get_cache_dir, get_keys_dir, format_timedelta, prune_within, prune_split, \
|
||||
Manifest, remove_surrogates, update_excludes, format_archive, check_extension_modules, Statistics, \
|
||||
is_cachedir, bigint_to_int
|
||||
is_cachedir, bigint_to_int, ChunkerParams
|
||||
from .remote import RepositoryServer, RemoteRepository
|
||||
|
||||
|
||||
|
|
@ -101,10 +101,12 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
t0 = datetime.now()
|
||||
repository = self.open_repository(args.archive, exclusive=True)
|
||||
manifest, key = Manifest.load(repository)
|
||||
key.compression_level = args.compression
|
||||
cache = Cache(repository, key, manifest, do_files=args.cache_files)
|
||||
archive = Archive(repository, key, manifest, args.archive.archive, cache=cache,
|
||||
create=True, checkpoint_interval=args.checkpoint_interval,
|
||||
numeric_owner=args.numeric_owner, progress=args.progress)
|
||||
numeric_owner=args.numeric_owner, progress=args.progress,
|
||||
chunker_params=args.chunker_params)
|
||||
try:
|
||||
# Add cache dir to inode_skip list
|
||||
skip_inodes = set()
|
||||
|
|
@ -171,9 +173,6 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
# Entering a new filesystem?
|
||||
if restrict_dev and st.st_dev != restrict_dev:
|
||||
return
|
||||
# Ignore unix sockets
|
||||
if stat.S_ISSOCK(st.st_mode):
|
||||
return
|
||||
status = None
|
||||
if stat.S_ISREG(st.st_mode):
|
||||
try:
|
||||
|
|
@ -199,6 +198,9 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
status = archive.process_fifo(path, st)
|
||||
elif stat.S_ISCHR(st.st_mode) or stat.S_ISBLK(st.st_mode):
|
||||
status = archive.process_dev(path, st)
|
||||
elif stat.S_ISSOCK(st.st_mode):
|
||||
# Ignore unix sockets
|
||||
return
|
||||
else:
|
||||
self.print_error('Unknown file type: %s', path)
|
||||
return
|
||||
|
|
@ -287,8 +289,8 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
stats.print_('Deleted data:', cache)
|
||||
else:
|
||||
print("You requested to completely DELETE the repository *including* all archives it contains:")
|
||||
for archive in sorted(Archive.list_archives(repository, key, manifest), key=attrgetter('ts')):
|
||||
print(format_archive(archive))
|
||||
for archive_info in manifest.list_archive_infos(sort_by='ts'):
|
||||
print(format_archive(archive_info))
|
||||
print("""Type "YES" if you understand this and want to continue.\n""")
|
||||
if input('Do you want to continue? ') == 'YES':
|
||||
repository.destroy()
|
||||
|
|
@ -357,8 +359,8 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
item[b'group'] or item[b'gid'], size, format_time(mtime),
|
||||
remove_surrogates(item[b'path']), extra))
|
||||
else:
|
||||
for archive in sorted(Archive.list_archives(repository, key, manifest), key=attrgetter('ts')):
|
||||
print(format_archive(archive))
|
||||
for archive_info in manifest.list_archive_infos(sort_by='ts'):
|
||||
print(format_archive(archive_info))
|
||||
return self.exit_code
|
||||
|
||||
def do_info(self, args):
|
||||
|
|
@ -383,11 +385,10 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
repository = self.open_repository(args.repository, exclusive=True)
|
||||
manifest, key = Manifest.load(repository)
|
||||
cache = Cache(repository, key, manifest, do_files=args.cache_files)
|
||||
archives = list(sorted(Archive.list_archives(repository, key, manifest, cache),
|
||||
key=attrgetter('ts'), reverse=True))
|
||||
archives = manifest.list_archive_infos(sort_by='ts', reverse=True) # just a ArchiveInfo list
|
||||
if args.hourly + args.daily + args.weekly + args.monthly + args.yearly == 0 and args.within is None:
|
||||
self.print_error('At least one of the "within", "hourly", "daily", "weekly", "monthly" or "yearly" '
|
||||
'settings must be specified')
|
||||
self.print_error('At least one of the "within", "keep-hourly", "keep-daily", "keep-weekly", '
|
||||
'"keep-monthly" or "keep-yearly" settings must be specified')
|
||||
return 1
|
||||
if args.prefix:
|
||||
archives = [archive for archive in archives if archive.name.startswith(args.prefix)]
|
||||
|
|
@ -415,7 +416,7 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
self.print_verbose('Would prune: %s' % format_archive(archive))
|
||||
else:
|
||||
self.print_verbose('Pruning archive: %s' % format_archive(archive))
|
||||
archive.delete(stats)
|
||||
Archive(repository, key, manifest, archive.name, cache).delete(stats)
|
||||
if to_delete and not args.dry_run:
|
||||
manifest.write()
|
||||
repository.commit()
|
||||
|
|
@ -519,8 +520,12 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
parser = argparse.ArgumentParser(description='Borg %s - Deduplicated Backups' % __version__)
|
||||
subparsers = parser.add_subparsers(title='Available commands')
|
||||
|
||||
serve_epilog = textwrap.dedent("""
|
||||
This command starts a repository server process. This command is usually not used manually.
|
||||
""")
|
||||
subparser = subparsers.add_parser('serve', parents=[common_parser],
|
||||
description=self.do_serve.__doc__)
|
||||
description=self.do_serve.__doc__, epilog=serve_epilog,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
subparser.set_defaults(func=self.do_serve)
|
||||
subparser.add_argument('--restrict-to-path', dest='restrict_to_paths', action='append',
|
||||
metavar='PATH', help='restrict repository access to PATH')
|
||||
|
|
@ -625,6 +630,14 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
|||
metavar='yyyy-mm-ddThh:mm:ss',
|
||||
help='manually specify the archive creation date/time (UTC). '
|
||||
'alternatively, give a reference file/directory.')
|
||||
subparser.add_argument('--chunker-params', dest='chunker_params',
|
||||
type=ChunkerParams, default=CHUNKER_PARAMS,
|
||||
metavar='CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE',
|
||||
help='specify the chunker parameters. default: %d,%d,%d,%d' % CHUNKER_PARAMS)
|
||||
subparser.add_argument('-C', '--compression', dest='compression',
|
||||
type=int, default=0, metavar='N',
|
||||
help='select compression algorithm and level. 0..9 is supported and means zlib '
|
||||
'level 0 (no compression, fast, default) .. zlib level 9 (high compression, slow).')
|
||||
subparser.add_argument('archive', metavar='ARCHIVE',
|
||||
type=location_validator(archive=True),
|
||||
help='archive to create')
|
||||
|
|
@ -856,19 +869,19 @@ def main():
|
|||
try:
|
||||
exit_code = archiver.run(sys.argv[1:])
|
||||
except Error as e:
|
||||
traceback.print_exc()
|
||||
archiver.print_error(e.get_message())
|
||||
archiver.print_error(e.get_message() + "\n%s" % traceback.format_exc())
|
||||
exit_code = e.exit_code
|
||||
except RemoteRepository.RPCError as e:
|
||||
print(e)
|
||||
archiver.print_error('Error: Remote Exception.\n%s' % str(e))
|
||||
exit_code = 1
|
||||
except Exception:
|
||||
archiver.print_error('Error: Local Exception.\n%s' % traceback.format_exc())
|
||||
exit_code = 1
|
||||
except KeyboardInterrupt:
|
||||
traceback.print_exc()
|
||||
archiver.print_error('Error: Keyboard interrupt')
|
||||
archiver.print_error('Error: Keyboard interrupt.\n%s' % traceback.format_exc())
|
||||
exit_code = 1
|
||||
else:
|
||||
if exit_code:
|
||||
archiver.print_error('Exiting with failure status due to previous errors')
|
||||
if exit_code:
|
||||
archiver.print_error('Exiting with failure status due to previous errors')
|
||||
sys.exit(exit_code)
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
|||
149
borg/cache.py
149
borg/cache.py
|
|
@ -1,11 +1,14 @@
|
|||
from configparser import RawConfigParser
|
||||
from .remote import cache_if_remote
|
||||
import errno
|
||||
import msgpack
|
||||
import os
|
||||
import sys
|
||||
import threading
|
||||
from binascii import hexlify
|
||||
import shutil
|
||||
import tarfile
|
||||
import tempfile
|
||||
|
||||
from .key import PlaintextKey
|
||||
from .helpers import Error, get_cache_dir, decode_dict, st_mtime_ns, unhexlify, UpgradableLock, int_to_bigint, \
|
||||
|
|
@ -98,7 +101,9 @@ class Cache:
|
|||
with open(os.path.join(self.path, 'config'), 'w') as fd:
|
||||
config.write(fd)
|
||||
ChunkIndex().write(os.path.join(self.path, 'chunks').encode('utf-8'))
|
||||
with open(os.path.join(self.path, 'files'), 'w') as fd:
|
||||
with open(os.path.join(self.path, 'chunks.archive'), 'wb') as fd:
|
||||
pass # empty file
|
||||
with open(os.path.join(self.path, 'files'), 'wb') as fd:
|
||||
pass # empty file
|
||||
|
||||
def destroy(self):
|
||||
|
|
@ -153,6 +158,7 @@ class Cache:
|
|||
os.mkdir(txn_dir)
|
||||
shutil.copy(os.path.join(self.path, 'config'), txn_dir)
|
||||
shutil.copy(os.path.join(self.path, 'chunks'), txn_dir)
|
||||
shutil.copy(os.path.join(self.path, 'chunks.archive'), txn_dir)
|
||||
shutil.copy(os.path.join(self.path, 'files'), txn_dir)
|
||||
os.rename(os.path.join(self.path, 'txn.tmp'),
|
||||
os.path.join(self.path, 'txn.active'))
|
||||
|
|
@ -194,6 +200,7 @@ class Cache:
|
|||
if os.path.exists(txn_dir):
|
||||
shutil.copy(os.path.join(txn_dir, 'config'), self.path)
|
||||
shutil.copy(os.path.join(txn_dir, 'chunks'), self.path)
|
||||
shutil.copy(os.path.join(txn_dir, 'chunks.archive'), self.path)
|
||||
shutil.copy(os.path.join(txn_dir, 'files'), self.path)
|
||||
os.rename(txn_dir, os.path.join(self.path, 'txn.tmp'))
|
||||
if os.path.exists(os.path.join(self.path, 'txn.tmp')):
|
||||
|
|
@ -202,37 +209,139 @@ class Cache:
|
|||
self._do_open()
|
||||
|
||||
def sync(self):
|
||||
"""Initializes cache by fetching and reading all archive indicies
|
||||
"""Re-synchronize chunks cache with repository.
|
||||
|
||||
If present, uses a compressed tar archive of known backup archive
|
||||
indices, so it only needs to fetch infos from repo and build a chunk
|
||||
index once per backup archive.
|
||||
If out of sync, the tar gets rebuilt from known + fetched chunk infos,
|
||||
so it has complete and current information about all backup archives.
|
||||
Finally, it builds the master chunks index by merging all indices from
|
||||
the tar.
|
||||
|
||||
Note: compression (esp. xz) is very effective in keeping the tar
|
||||
relatively small compared to the files it contains.
|
||||
"""
|
||||
def add(id, size, csize):
|
||||
in_archive_path = os.path.join(self.path, 'chunks.archive')
|
||||
out_archive_path = os.path.join(self.path, 'chunks.archive.tmp')
|
||||
|
||||
def open_in_archive():
|
||||
try:
|
||||
count, size, csize = self.chunks[id]
|
||||
self.chunks[id] = count + 1, size, csize
|
||||
tf = tarfile.open(in_archive_path, 'r')
|
||||
except OSError as e:
|
||||
if e.errno != errno.ENOENT:
|
||||
raise
|
||||
# file not found
|
||||
tf = None
|
||||
except tarfile.ReadError:
|
||||
# empty file?
|
||||
tf = None
|
||||
return tf
|
||||
|
||||
def open_out_archive():
|
||||
for compression in ('xz', 'bz2', 'gz'):
|
||||
# xz needs py 3.3, bz2 and gz also work on 3.2
|
||||
try:
|
||||
tf = tarfile.open(out_archive_path, 'w:'+compression, format=tarfile.PAX_FORMAT)
|
||||
break
|
||||
except tarfile.CompressionError:
|
||||
continue
|
||||
else: # shouldn't happen
|
||||
tf = None
|
||||
return tf
|
||||
|
||||
def close_archive(tf):
|
||||
if tf:
|
||||
tf.close()
|
||||
|
||||
def delete_in_archive():
|
||||
os.unlink(in_archive_path)
|
||||
|
||||
def rename_out_archive():
|
||||
os.rename(out_archive_path, in_archive_path)
|
||||
|
||||
def add(chunk_idx, id, size, csize, incr=1):
|
||||
try:
|
||||
count, size, csize = chunk_idx[id]
|
||||
chunk_idx[id] = count + incr, size, csize
|
||||
except KeyError:
|
||||
self.chunks[id] = 1, size, csize
|
||||
self.begin_txn()
|
||||
print('Initializing cache...')
|
||||
self.chunks.clear()
|
||||
unpacker = msgpack.Unpacker()
|
||||
repository = cache_if_remote(self.repository)
|
||||
for name, info in self.manifest.archives.items():
|
||||
archive_id = info[b'id']
|
||||
chunk_idx[id] = incr, size, csize
|
||||
|
||||
def transfer_known_idx(archive_id, tf_in, tf_out):
|
||||
archive_id_hex = hexlify(archive_id).decode('ascii')
|
||||
tarinfo = tf_in.getmember(archive_id_hex)
|
||||
archive_name = tarinfo.pax_headers['archive_name']
|
||||
print('Already known archive:', archive_name)
|
||||
f_in = tf_in.extractfile(archive_id_hex)
|
||||
tf_out.addfile(tarinfo, f_in)
|
||||
return archive_name
|
||||
|
||||
def fetch_and_build_idx(archive_id, repository, key, tmp_dir, tf_out):
|
||||
chunk_idx = ChunkIndex()
|
||||
cdata = repository.get(archive_id)
|
||||
data = self.key.decrypt(archive_id, cdata)
|
||||
add(archive_id, len(data), len(cdata))
|
||||
data = key.decrypt(archive_id, cdata)
|
||||
add(chunk_idx, archive_id, len(data), len(cdata))
|
||||
archive = msgpack.unpackb(data)
|
||||
if archive[b'version'] != 1:
|
||||
raise Exception('Unknown archive metadata version')
|
||||
decode_dict(archive, (b'name',))
|
||||
print('Analyzing archive:', archive[b'name'])
|
||||
for key, chunk in zip(archive[b'items'], repository.get_many(archive[b'items'])):
|
||||
data = self.key.decrypt(key, chunk)
|
||||
add(key, len(data), len(chunk))
|
||||
print('Analyzing new archive:', archive[b'name'])
|
||||
unpacker = msgpack.Unpacker()
|
||||
for item_id, chunk in zip(archive[b'items'], repository.get_many(archive[b'items'])):
|
||||
data = key.decrypt(item_id, chunk)
|
||||
add(chunk_idx, item_id, len(data), len(chunk))
|
||||
unpacker.feed(data)
|
||||
for item in unpacker:
|
||||
if b'chunks' in item:
|
||||
for chunk_id, size, csize in item[b'chunks']:
|
||||
add(chunk_id, size, csize)
|
||||
add(chunk_idx, chunk_id, size, csize)
|
||||
archive_id_hex = hexlify(archive_id).decode('ascii')
|
||||
file_tmp = os.path.join(tmp_dir, archive_id_hex).encode('utf-8')
|
||||
chunk_idx.write(file_tmp)
|
||||
tarinfo = tf_out.gettarinfo(file_tmp, archive_id_hex)
|
||||
tarinfo.pax_headers['archive_name'] = archive[b'name']
|
||||
with open(file_tmp, 'rb') as f:
|
||||
tf_out.addfile(tarinfo, f)
|
||||
os.unlink(file_tmp)
|
||||
|
||||
def create_master_idx(chunk_idx, tf_in, tmp_dir):
|
||||
chunk_idx.clear()
|
||||
for tarinfo in tf_in:
|
||||
archive_id_hex = tarinfo.name
|
||||
tf_in.extract(archive_id_hex, tmp_dir)
|
||||
chunk_idx_path = os.path.join(tmp_dir, archive_id_hex).encode('utf-8')
|
||||
archive_chunk_idx = ChunkIndex.read(chunk_idx_path)
|
||||
for chunk_id, (count, size, csize) in archive_chunk_idx.iteritems():
|
||||
add(chunk_idx, chunk_id, size, csize, incr=count)
|
||||
os.unlink(chunk_idx_path)
|
||||
|
||||
self.begin_txn()
|
||||
print('Synchronizing chunks cache...')
|
||||
# XXX we have to do stuff on disk due to lacking ChunkIndex api
|
||||
with tempfile.TemporaryDirectory() as tmp_dir:
|
||||
repository = cache_if_remote(self.repository)
|
||||
out_archive = open_out_archive()
|
||||
in_archive = open_in_archive()
|
||||
if in_archive:
|
||||
known_ids = set(unhexlify(hexid) for hexid in in_archive.getnames())
|
||||
else:
|
||||
known_ids = set()
|
||||
archive_ids = set(info[b'id'] for info in self.manifest.archives.values())
|
||||
print('Rebuilding archive collection. Known: %d Repo: %d Unknown: %d' % (
|
||||
len(known_ids), len(archive_ids), len(archive_ids - known_ids), ))
|
||||
for archive_id in archive_ids & known_ids:
|
||||
transfer_known_idx(archive_id, in_archive, out_archive)
|
||||
close_archive(in_archive)
|
||||
delete_in_archive() # free disk space
|
||||
for archive_id in archive_ids - known_ids:
|
||||
fetch_and_build_idx(archive_id, repository, self.key, tmp_dir, out_archive)
|
||||
close_archive(out_archive)
|
||||
rename_out_archive()
|
||||
print('Merging collection into master chunks cache...')
|
||||
in_archive = open_in_archive()
|
||||
create_master_idx(self.chunks, in_archive, tmp_dir)
|
||||
close_archive(in_archive)
|
||||
print('Done.')
|
||||
|
||||
def add_chunk(self, id, data, stats):
|
||||
if not self.txn_active:
|
||||
|
|
|
|||
|
|
@ -20,8 +20,11 @@ cdef extern from "_chunker.c":
|
|||
cdef class Chunker:
|
||||
cdef _Chunker *chunker
|
||||
|
||||
def __cinit__(self, window_size, chunk_mask, min_size, max_size, seed):
|
||||
self.chunker = chunker_init(window_size, chunk_mask, min_size, max_size, seed & 0xffffffff)
|
||||
def __cinit__(self, seed, chunk_min_exp, chunk_max_exp, hash_mask_bits, hash_window_size):
|
||||
min_size = 1 << chunk_min_exp
|
||||
max_size = 1 << chunk_max_exp
|
||||
hash_mask = (1 << hash_mask_bits) - 1
|
||||
self.chunker = chunker_init(hash_window_size, hash_mask, min_size, max_size, seed & 0xffffffff)
|
||||
|
||||
def chunkify(self, fd, fh=-1):
|
||||
"""
|
||||
|
|
|
|||
|
|
@ -11,7 +11,9 @@ cdef extern from "_hashindex.c":
|
|||
HashIndex *hashindex_read(char *path)
|
||||
HashIndex *hashindex_init(int capacity, int key_size, int value_size)
|
||||
void hashindex_free(HashIndex *index)
|
||||
void hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize, long long *unique_size, long long *unique_csize)
|
||||
void hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize,
|
||||
long long *unique_size, long long *unique_csize,
|
||||
long long *total_unique_chunks, long long *total_chunks)
|
||||
int hashindex_get_size(HashIndex *index)
|
||||
int hashindex_write(HashIndex *index, char *path)
|
||||
void *hashindex_get(HashIndex *index, void *key)
|
||||
|
|
@ -179,9 +181,11 @@ cdef class ChunkIndex(IndexBase):
|
|||
return iter
|
||||
|
||||
def summarize(self):
|
||||
cdef long long total_size, total_csize, unique_size, unique_csize
|
||||
hashindex_summarize(self.index, &total_size, &total_csize, &unique_size, &unique_csize)
|
||||
return total_size, total_csize, unique_size, unique_csize
|
||||
cdef long long total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks
|
||||
hashindex_summarize(self.index, &total_size, &total_csize,
|
||||
&unique_size, &unique_csize,
|
||||
&total_unique_chunks, &total_chunks)
|
||||
return total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks
|
||||
|
||||
|
||||
cdef class ChunkKeyIterator:
|
||||
|
|
|
|||
|
|
@ -1,5 +1,6 @@
|
|||
import argparse
|
||||
import binascii
|
||||
from collections import namedtuple
|
||||
import grp
|
||||
import msgpack
|
||||
import os
|
||||
|
|
@ -122,6 +123,18 @@ class Manifest:
|
|||
self.id = self.key.id_hash(data)
|
||||
self.repository.put(self.MANIFEST_ID, self.key.encrypt(data))
|
||||
|
||||
def list_archive_infos(self, sort_by=None, reverse=False):
|
||||
# inexpensive Archive.list_archives replacement if we just need .name, .id, .ts
|
||||
ArchiveInfo = namedtuple('ArchiveInfo', 'name id ts')
|
||||
archives = []
|
||||
for name, values in self.archives.items():
|
||||
ts = parse_timestamp(values[b'time'].decode('utf-8'))
|
||||
id = values[b'id']
|
||||
archives.append(ArchiveInfo(name=name, id=id, ts=ts))
|
||||
if sort_by is not None:
|
||||
archives = sorted(archives, key=attrgetter(sort_by), reverse=reverse)
|
||||
return archives
|
||||
|
||||
|
||||
def prune_within(archives, within):
|
||||
multiplier = {'H': 1, 'd': 24, 'w': 24*7, 'm': 24*31, 'y': 24*365}
|
||||
|
|
@ -164,11 +177,14 @@ class Statistics:
|
|||
self.usize += csize
|
||||
|
||||
def print_(self, label, cache):
|
||||
total_size, total_csize, unique_size, unique_csize = cache.chunks.summarize()
|
||||
total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks = cache.chunks.summarize()
|
||||
print()
|
||||
print(' Original size Compressed size Deduplicated size')
|
||||
print('%-15s %20s %20s %20s' % (label, format_file_size(self.osize), format_file_size(self.csize), format_file_size(self.usize)))
|
||||
print('All archives: %20s %20s %20s' % (format_file_size(total_size), format_file_size(total_csize), format_file_size(unique_csize)))
|
||||
print()
|
||||
print(' Unique chunks Total chunks')
|
||||
print('Chunk index: %20d %20d' % (total_unique_chunks, total_chunks))
|
||||
|
||||
def show_progress(self, item=None, final=False):
|
||||
if not final:
|
||||
|
|
@ -300,6 +316,11 @@ def timestamp(s):
|
|||
raise ValueError
|
||||
|
||||
|
||||
def ChunkerParams(s):
|
||||
window_size, chunk_mask, chunk_min, chunk_max = s.split(',')
|
||||
return int(window_size), int(chunk_mask), int(chunk_min), int(chunk_max)
|
||||
|
||||
|
||||
def is_cachedir(path):
|
||||
"""Determines whether the specified path is a cache directory (and
|
||||
therefore should potentially be excluded from the backup) according to
|
||||
|
|
|
|||
|
|
@ -53,6 +53,7 @@ class KeyBase:
|
|||
|
||||
def __init__(self):
|
||||
self.TYPE_STR = bytes([self.TYPE])
|
||||
self.compression_level = 0
|
||||
|
||||
def id_hash(self, data):
|
||||
"""Return HMAC hash using the "id" HMAC key
|
||||
|
|
@ -83,7 +84,7 @@ class PlaintextKey(KeyBase):
|
|||
return sha256(data).digest()
|
||||
|
||||
def encrypt(self, data):
|
||||
return b''.join([self.TYPE_STR, zlib.compress(data)])
|
||||
return b''.join([self.TYPE_STR, zlib.compress(data, self.compression_level)])
|
||||
|
||||
def decrypt(self, id, data):
|
||||
if data[0] != self.TYPE:
|
||||
|
|
@ -115,7 +116,7 @@ class AESKeyBase(KeyBase):
|
|||
return HMAC(self.id_key, data, sha256).digest()
|
||||
|
||||
def encrypt(self, data):
|
||||
data = zlib.compress(data)
|
||||
data = zlib.compress(data, self.compression_level)
|
||||
self.enc_cipher.reset()
|
||||
data = b''.join((self.enc_cipher.iv[8:], self.enc_cipher.encrypt(data)))
|
||||
hmac = HMAC(self.enc_hmac_key, data, sha256).digest()
|
||||
|
|
|
|||
|
|
@ -141,7 +141,10 @@ class RemoteRepository:
|
|||
self.r_fds = [self.stdout_fd]
|
||||
self.x_fds = [self.stdin_fd, self.stdout_fd]
|
||||
|
||||
version = self.call('negotiate', 1)
|
||||
try:
|
||||
version = self.call('negotiate', 1)
|
||||
except ConnectionClosed:
|
||||
raise Exception('Server immediately closed connection - is Borg installed and working on the server?')
|
||||
if version != 1:
|
||||
raise Exception('Server insisted on using unsupported protocol version %d' % version)
|
||||
self.id = self.call('open', location.path, create)
|
||||
|
|
|
|||
|
|
@ -14,6 +14,7 @@ from .lrucache import LRUCache
|
|||
|
||||
MAX_OBJECT_SIZE = 20 * 1024 * 1024
|
||||
MAGIC = b'BORG_SEG'
|
||||
MAGIC_LEN = len(MAGIC)
|
||||
TAG_PUT = 0
|
||||
TAG_DELETE = 1
|
||||
TAG_COMMIT = 2
|
||||
|
|
@ -281,8 +282,8 @@ class Repository:
|
|||
continue
|
||||
try:
|
||||
objects = list(self.io.iter_objects(segment))
|
||||
except (IntegrityError, struct.error):
|
||||
report_error('Error reading segment {}'.format(segment))
|
||||
except IntegrityError as err:
|
||||
report_error('Error reading segment {}: {}'.format(segment, err))
|
||||
objects = []
|
||||
if repair:
|
||||
self.io.recover_segment(segment, filename)
|
||||
|
|
@ -481,7 +482,7 @@ class LoggedIO:
|
|||
os.mkdir(dirname)
|
||||
self._write_fd = open(self.segment_filename(self.segment), 'ab')
|
||||
self._write_fd.write(MAGIC)
|
||||
self.offset = 8
|
||||
self.offset = MAGIC_LEN
|
||||
return self._write_fd
|
||||
|
||||
def get_fd(self, segment):
|
||||
|
|
@ -504,19 +505,26 @@ class LoggedIO:
|
|||
def iter_objects(self, segment, include_data=False):
|
||||
fd = self.get_fd(segment)
|
||||
fd.seek(0)
|
||||
if fd.read(8) != MAGIC:
|
||||
raise IntegrityError('Invalid segment header')
|
||||
offset = 8
|
||||
if fd.read(MAGIC_LEN) != MAGIC:
|
||||
raise IntegrityError('Invalid segment magic')
|
||||
offset = MAGIC_LEN
|
||||
header = fd.read(self.header_fmt.size)
|
||||
while header:
|
||||
crc, size, tag = self.header_fmt.unpack(header)
|
||||
try:
|
||||
crc, size, tag = self.header_fmt.unpack(header)
|
||||
except struct.error as err:
|
||||
raise IntegrityError('Invalid segment entry header [offset {}]: {}'.format(offset, err))
|
||||
if size > MAX_OBJECT_SIZE:
|
||||
raise IntegrityError('Invalid segment object size')
|
||||
rest = fd.read(size - self.header_fmt.size)
|
||||
raise IntegrityError('Invalid segment entry size [offset {}]'.format(offset))
|
||||
length = size - self.header_fmt.size
|
||||
rest = fd.read(length)
|
||||
if len(rest) != length:
|
||||
raise IntegrityError('Segment entry data short read [offset {}]: expected: {}, got {} bytes'.format(
|
||||
offset, length, len(rest)))
|
||||
if crc32(rest, crc32(memoryview(header)[4:])) & 0xffffffff != crc:
|
||||
raise IntegrityError('Segment checksum mismatch')
|
||||
raise IntegrityError('Segment entry checksum mismatch [offset {}]'.format(offset))
|
||||
if tag not in (TAG_PUT, TAG_DELETE, TAG_COMMIT):
|
||||
raise IntegrityError('Invalid segment entry header')
|
||||
raise IntegrityError('Invalid segment entry tag [offset {}]'.format(offset))
|
||||
key = None
|
||||
if tag in (TAG_PUT, TAG_DELETE):
|
||||
key = rest[:32]
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ import unittest
|
|||
from hashlib import sha256
|
||||
|
||||
from .. import xattr
|
||||
from ..archive import Archive, ChunkBuffer, CHUNK_MAX
|
||||
from ..archive import Archive, ChunkBuffer, CHUNK_MAX_EXP
|
||||
from ..archiver import Archiver
|
||||
from ..cache import Cache
|
||||
from ..crypto import bytes_to_long, num_aes_blocks
|
||||
|
|
@ -213,7 +213,7 @@ class ArchiverTestCase(ArchiverTestCaseBase):
|
|||
sparse_support = sys.platform != 'darwin'
|
||||
filename = os.path.join(self.input_path, 'sparse')
|
||||
content = b'foobar'
|
||||
hole_size = 5 * CHUNK_MAX # 5 full chunker buffers
|
||||
hole_size = 5 * (1 << CHUNK_MAX_EXP) # 5 full chunker buffers
|
||||
with open(filename, 'wb') as fd:
|
||||
# create a file that has a hole at the beginning and end (if the
|
||||
# OS and filesystem supports sparse files)
|
||||
|
|
@ -400,9 +400,9 @@ class ArchiverTestCase(ArchiverTestCaseBase):
|
|||
self.cmd('extract', '--dry-run', self.repository_location + '::test')
|
||||
self.cmd('check', self.repository_location)
|
||||
name = sorted(os.listdir(os.path.join(self.tmpdir, 'repository', 'data', '0')), reverse=True)[0]
|
||||
with open(os.path.join(self.tmpdir, 'repository', 'data', '0', name), 'r+') as fd:
|
||||
with open(os.path.join(self.tmpdir, 'repository', 'data', '0', name), 'r+b') as fd:
|
||||
fd.seek(100)
|
||||
fd.write('XXXX')
|
||||
fd.write(b'XXXX')
|
||||
self.cmd('check', self.repository_location, exit_code=1)
|
||||
|
||||
def test_readonly_repository(self):
|
||||
|
|
|
|||
|
|
@ -1,27 +1,27 @@
|
|||
from io import BytesIO
|
||||
|
||||
from ..chunker import Chunker, buzhash, buzhash_update
|
||||
from ..archive import CHUNK_MAX
|
||||
from ..archive import CHUNK_MAX_EXP
|
||||
from . import BaseTestCase
|
||||
|
||||
|
||||
class ChunkerTestCase(BaseTestCase):
|
||||
|
||||
def test_chunkify(self):
|
||||
data = b'0' * int(1.5 * CHUNK_MAX) + b'Y'
|
||||
parts = [bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 0).chunkify(BytesIO(data))]
|
||||
data = b'0' * int(1.5 * (1 << CHUNK_MAX_EXP)) + b'Y'
|
||||
parts = [bytes(c) for c in Chunker(0, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(data))]
|
||||
self.assert_equal(len(parts), 2)
|
||||
self.assert_equal(b''.join(parts), data)
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 0).chunkify(BytesIO(b''))], [])
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 0).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fooba', b'rboobaz', b'fooba', b'rboobaz', b'fooba', b'rboobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 1).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fo', b'obarb', b'oob', b'azf', b'oobarb', b'oob', b'azf', b'oobarb', b'oobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'ar', b'boobazfoob', b'ar', b'boobazfoob', b'ar', b'boobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 3, CHUNK_MAX, 0).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 3, CHUNK_MAX, 1).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobar', b'boo', b'bazfo', b'obar', b'boo', b'bazfo', b'obar', b'boobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 3, CHUNK_MAX, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foo', b'barboobaz', b'foo', b'barboobaz', b'foo', b'barboobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 4, CHUNK_MAX, 0).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 4, CHUNK_MAX, 1).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobar', b'boobazfo', b'obar', b'boobazfo', b'obar', b'boobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 4, CHUNK_MAX, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'arboobaz', b'foob', b'arboobaz', b'foob', b'arboobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(0, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b''))], [])
|
||||
self.assert_equal([bytes(c) for c in Chunker(0, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fooba', b'rboobaz', b'fooba', b'rboobaz', b'fooba', b'rboobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(1, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fo', b'obarb', b'oob', b'azf', b'oobarb', b'oob', b'azf', b'oobarb', b'oobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'ar', b'boobazfoob', b'ar', b'boobazfoob', b'ar', b'boobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(0, 2, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
||||
self.assert_equal([bytes(c) for c in Chunker(1, 2, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobar', b'boobazfo', b'obar', b'boobazfo', b'obar', b'boobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 2, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'arboobaz', b'foob', b'arboobaz', b'foob', b'arboobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(0, 3, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
||||
self.assert_equal([bytes(c) for c in Chunker(1, 3, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarbo', b'obazfoobar', b'boobazfo', b'obarboobaz'])
|
||||
self.assert_equal([bytes(c) for c in Chunker(2, 3, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz', b'foobarboobaz', b'foobarboobaz'])
|
||||
|
||||
def test_buzhash(self):
|
||||
self.assert_equal(buzhash(b'abcdefghijklmnop', 0), 3795437769)
|
||||
|
|
|
|||
4
docs/_themes/local/static/local.css_t
vendored
4
docs/_themes/local/static/local.css_t
vendored
|
|
@ -161,8 +161,8 @@ p.admonition-title:after {
|
|||
}
|
||||
|
||||
div.note {
|
||||
background-color: #0f5;
|
||||
border-bottom: 2px solid #d22;
|
||||
background-color: #002211;
|
||||
border-bottom: 2px solid #22dd22;
|
||||
}
|
||||
|
||||
div.seealso {
|
||||
|
|
|
|||
|
|
@ -51,7 +51,7 @@ Which file types, attributes, etc. are *not* preserved?
|
|||
recreate them in any case). So, don't panic if your backup misses a UDS!
|
||||
* The precise on-disk representation of the holes in a sparse file.
|
||||
Archive creation has no special support for sparse files, holes are
|
||||
backed up up as (deduplicated and compressed) runs of zero bytes.
|
||||
backed up as (deduplicated and compressed) runs of zero bytes.
|
||||
Archive extraction has optional support to extract all-zero chunks as
|
||||
holes in a sparse file.
|
||||
|
||||
|
|
|
|||
|
|
@ -62,21 +62,60 @@ Some of the steps detailled below might be useful also for non-git installs.
|
|||
# optional: for unit testing
|
||||
apt-get install fakeroot
|
||||
|
||||
# install virtualenv tool, create and activate a virtual env
|
||||
apt-get install python-virtualenv
|
||||
virtualenv --python=python3 borg-env
|
||||
source borg-env/bin/activate # always do this before using!
|
||||
|
||||
# install some dependencies into virtual env
|
||||
pip install cython # to compile .pyx -> .c
|
||||
pip install tox pytest # optional, for running unit tests
|
||||
pip install sphinx # optional, to build the docs
|
||||
|
||||
# get |project_name| from github, install it
|
||||
git clone |git_url|
|
||||
|
||||
apt-get install python-virtualenv
|
||||
virtualenv --python=python3 borg-env
|
||||
source borg-env/bin/activate # always before using!
|
||||
|
||||
# install borg + dependencies into virtualenv
|
||||
pip install cython # compile .pyx -> .c
|
||||
pip install tox pytest # optional, for running unit tests
|
||||
pip install sphinx # optional, to build the docs
|
||||
cd borg
|
||||
pip install -e . # in-place editable mode
|
||||
|
||||
# optional: run all the tests, on all supported Python versions
|
||||
fakeroot -u tox
|
||||
|
||||
|
||||
Korora / Fedora 21 installation (from git)
|
||||
------------------------------------------
|
||||
Note: this uses latest, unreleased development code from git.
|
||||
While we try not to break master, there are no guarantees on anything.
|
||||
|
||||
Some of the steps detailled below might be useful also for non-git installs.
|
||||
|
||||
.. parsed-literal::
|
||||
# Python 3.x (>= 3.2) + Headers, Py Package Installer
|
||||
sudo dnf install python3 python3-devel python3-pip
|
||||
|
||||
# we need OpenSSL + Headers for Crypto
|
||||
sudo dnf install openssl-devel openssl
|
||||
|
||||
# ACL support Headers + Library
|
||||
sudo dnf install libacl-devel libacl
|
||||
|
||||
# optional: lowlevel FUSE py binding - to mount backup archives
|
||||
sudo dnf install python3-llfuse fuse
|
||||
|
||||
# optional: for unit testing
|
||||
sudo dnf install fakeroot
|
||||
|
||||
# get |project_name| from github, install it
|
||||
git clone |git_url|
|
||||
|
||||
dnf install python3-virtualenv
|
||||
virtualenv --python=python3 borg-env
|
||||
source borg-env/bin/activate # always before using!
|
||||
|
||||
# install borg + dependencies into virtualenv
|
||||
pip install cython # compile .pyx -> .c
|
||||
pip install tox pytest # optional, for running unit tests
|
||||
pip install sphinx # optional, to build the docs
|
||||
cd borg
|
||||
pip install -e . # in-place editable mode
|
||||
|
||||
# optional: run all the tests, on all supported Python versions
|
||||
fakeroot -u tox
|
||||
|
|
|
|||
|
|
@ -6,38 +6,43 @@ Internals
|
|||
|
||||
This page documents the internal data structures and storage
|
||||
mechanisms of |project_name|. It is partly based on `mailing list
|
||||
discussion about internals`_ and also on static code analysis. It may
|
||||
not be exactly up to date with the current source code.
|
||||
discussion about internals`_ and also on static code analysis.
|
||||
|
||||
It may not be exactly up to date with the current source code.
|
||||
|
||||
Repository and Archives
|
||||
-----------------------
|
||||
|
||||
|project_name| stores its data in a `Repository`. Each repository can
|
||||
hold multiple `Archives`, which represent individual backups that
|
||||
contain a full archive of the files specified when the backup was
|
||||
performed. Deduplication is performed across multiple backups, both on
|
||||
data and metadata, using `Segments` chunked with the Buzhash_
|
||||
algorithm. Each repository has the following file structure:
|
||||
data and metadata, using `Chunks` created by the chunker using the Buzhash_
|
||||
algorithm.
|
||||
|
||||
Each repository has the following file structure:
|
||||
|
||||
README
|
||||
simple text file describing the repository
|
||||
simple text file telling that this is a |project_name| repository
|
||||
|
||||
config
|
||||
description of the repository, includes the unique identifier. also
|
||||
acts as a lock file
|
||||
repository configuration and lock file
|
||||
|
||||
data/
|
||||
directory where the actual data (`segments`) is stored
|
||||
directory where the actual data is stored
|
||||
|
||||
hints.%d
|
||||
undocumented
|
||||
hints for repository compaction
|
||||
|
||||
index.%d
|
||||
cache of the file indexes. those files can be regenerated with
|
||||
``check --repair``
|
||||
repository index
|
||||
|
||||
|
||||
Config file
|
||||
-----------
|
||||
|
||||
Each repository has a ``config`` file which which is a ``INI``
|
||||
formatted file which looks like this::
|
||||
Each repository has a ``config`` file which which is a ``INI``-style file
|
||||
and looks like this::
|
||||
|
||||
[repository]
|
||||
version = 1
|
||||
|
|
@ -48,20 +53,35 @@ formatted file which looks like this::
|
|||
This is where the ``repository.id`` is stored. It is a unique
|
||||
identifier for repositories. It will not change if you move the
|
||||
repository around so you can make a local transfer then decide to move
|
||||
the repository in another (even remote) location at a later time.
|
||||
the repository to another (even remote) location at a later time.
|
||||
|
||||
|project_name| will do a POSIX read lock on that file when operating
|
||||
|project_name| will do a POSIX read lock on the config file when operating
|
||||
on the repository.
|
||||
|
||||
|
||||
Keys
|
||||
----
|
||||
The key to address the key/value store is usually computed like this:
|
||||
|
||||
key = id = id_hash(unencrypted_data)
|
||||
|
||||
The id_hash function is:
|
||||
|
||||
* sha256 (no encryption keys available)
|
||||
* hmac-sha256 (encryption keys available)
|
||||
|
||||
|
||||
Segments and archives
|
||||
---------------------
|
||||
|
||||
|project_name| is a "filesystem based transactional key value
|
||||
store". It makes extensive use of msgpack_ to store data and, unless
|
||||
A |project_name| repository is a filesystem based transactional key/value
|
||||
store. It makes extensive use of msgpack_ to store data and, unless
|
||||
otherwise noted, data is stored in msgpack_ encoded files.
|
||||
|
||||
Objects referenced by a key (256bits id/hash) are stored inline in
|
||||
files (`segments`) of size approx 5MB in ``repo/data``. They contain:
|
||||
Objects referenced by a key are stored inline in files (`segments`) of approx.
|
||||
5MB size in numbered subdirectories of ``repo/data``.
|
||||
|
||||
They contain:
|
||||
|
||||
* header size
|
||||
* crc
|
||||
|
|
@ -77,21 +97,26 @@ Tag is either ``PUT``, ``DELETE``, or ``COMMIT``. A segment file is
|
|||
basically a transaction log where each repository operation is
|
||||
appended to the file. So if an object is written to the repository a
|
||||
``PUT`` tag is written to the file followed by the object id and
|
||||
data. And if an object is deleted a ``DELETE`` tag is appended
|
||||
data. If an object is deleted a ``DELETE`` tag is appended
|
||||
followed by the object id. A ``COMMIT`` tag is written when a
|
||||
repository transaction is committed. When a repository is opened any
|
||||
``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
|
||||
discarded since they are part of a partial/uncommitted transaction.
|
||||
|
||||
The manifest is an object with an id of only zeros (32 bytes), that
|
||||
references all the archives. It contains:
|
||||
|
||||
The manifest
|
||||
------------
|
||||
|
||||
The manifest is an object with an all-zero key that references all the
|
||||
archives.
|
||||
It contains:
|
||||
|
||||
* version
|
||||
* list of archives
|
||||
* list of archive infos
|
||||
* timestamp
|
||||
* config
|
||||
|
||||
Each archive contains:
|
||||
Each archive info contains:
|
||||
|
||||
* name
|
||||
* id
|
||||
|
|
@ -102,21 +127,21 @@ each time.
|
|||
|
||||
The archive metadata does not contain the file items directly. Only
|
||||
references to other objects that contain that data. An archive is an
|
||||
object that contain metadata:
|
||||
object that contains:
|
||||
|
||||
* version
|
||||
* name
|
||||
* items list
|
||||
* list of chunks containing item metadata
|
||||
* cmdline
|
||||
* hostname
|
||||
* username
|
||||
* time
|
||||
|
||||
Each item represents a file or directory or
|
||||
symlink is stored as an ``item`` dictionary that contains:
|
||||
Each item represents a file, directory or other fs item and is stored as an
|
||||
``item`` dictionary that contains:
|
||||
|
||||
* path
|
||||
* list of chunks
|
||||
* list of data chunks
|
||||
* user
|
||||
* group
|
||||
* uid
|
||||
|
|
@ -135,124 +160,136 @@ it and it is reset every time an inode's metadata is changed.
|
|||
All items are serialized using msgpack and the resulting byte stream
|
||||
is fed into the same chunker used for regular file data and turned
|
||||
into deduplicated chunks. The reference to these chunks is then added
|
||||
to the archive metadata. This allows the archive to store many files,
|
||||
beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB.
|
||||
to the archive metadata.
|
||||
|
||||
A chunk is an object as well, of course. The chunk id is either
|
||||
HMAC-SHA256_, when encryption is used, or a SHA256_ hash otherwise.
|
||||
A chunk is stored as an object as well, of course.
|
||||
|
||||
Hints are stored in a file (``repo/hints``) and contain:
|
||||
|
||||
* version
|
||||
* list of segments
|
||||
* compact
|
||||
|
||||
Chunks
|
||||
------
|
||||
|
||||
|project_name| uses a rolling checksum with Buzhash_ algorithm, with
|
||||
window size of 4095 bytes (`0xFFF`), with a minimum of 1024, and triggers when
|
||||
the last 16 bits of the checksum are null, producing chunks of 64kB on
|
||||
average. All these parameters are fixed. The buzhash table is altered
|
||||
by XORing it with a seed randomly generated once for the archive, and
|
||||
stored encrypted in the keyfile.
|
||||
|project_name| uses a rolling hash computed by the Buzhash_ algorithm, with a
|
||||
window size of 4095 bytes (`0xFFF`), with a minimum chunk size of 1024 bytes.
|
||||
It triggers (chunks) when the last 16 bits of the hash are zero, producing
|
||||
chunks of 64kiB on average.
|
||||
|
||||
Indexes
|
||||
-------
|
||||
The buzhash table is altered by XORing it with a seed randomly generated once
|
||||
for the archive, and stored encrypted in the keyfile.
|
||||
|
||||
There are two main indexes: the chunk lookup index and the repository
|
||||
index. There is also the file chunk cache.
|
||||
|
||||
The chunk lookup index is stored in ``cache/chunk`` and is indexed on
|
||||
the ``chunk hash``. It contains:
|
||||
Indexes / Caches
|
||||
----------------
|
||||
|
||||
* reference count
|
||||
* size
|
||||
* ciphered size
|
||||
|
||||
The repository index is stored in ``repo/index.%d`` and is also
|
||||
indexed on ``chunk hash`` and contains:
|
||||
|
||||
* segment
|
||||
* offset
|
||||
|
||||
The repository index files are random access but those files can be
|
||||
recreated if damaged or lost using ``check --repair``.
|
||||
|
||||
Both indexes are stored as hash tables, directly mapped in memory from
|
||||
the file content, with only one slot per bucket, but that spreads the
|
||||
collisions to the following buckets. As a consequence the hash is just
|
||||
a start position for a linear search, and if the element is not in the
|
||||
table the index is linearly crossed until an empty bucket is
|
||||
found. When the table is full at 90% its size is doubled, when it's
|
||||
empty at 25% its size is halfed. So operations on it have a variable
|
||||
complexity between constant and linear with low factor, and memory
|
||||
overhead varies between 10% and 300%.
|
||||
|
||||
The file chunk cache is stored in ``cache/files`` and is indexed on
|
||||
the ``file path hash`` and contains:
|
||||
The files cache is stored in ``cache/files`` and is indexed on the
|
||||
``file path hash``. At backup time, it is used to quickly determine whether we
|
||||
need to chunk a given file (or whether it is unchanged and we already have all
|
||||
its pieces).
|
||||
It contains:
|
||||
|
||||
* age
|
||||
* inode number
|
||||
* size
|
||||
* mtime_ns
|
||||
* chunks hashes
|
||||
* file inode number
|
||||
* file size
|
||||
* file mtime_ns
|
||||
* file content chunk hashes
|
||||
|
||||
The inode number is stored to make sure we distinguish between
|
||||
different files, as a single path may not be unique across different
|
||||
archives in different setups.
|
||||
|
||||
The file chunk cache is stored as a python associative array storing
|
||||
python objects, which generate a lot of overhead. This takes around
|
||||
240 bytes per file without the chunk list, to be compared to at most
|
||||
64 bytes of real data (depending on data alignment), and around 80
|
||||
bytes per chunk hash (vs 32), with a minimum of ~250 bytes even if
|
||||
only one chunk hash.
|
||||
The files cache is stored as a python associative array storing
|
||||
python objects, which generates a lot of overhead.
|
||||
|
||||
Indexes memory usage
|
||||
--------------------
|
||||
The chunks cache is stored in ``cache/chunks`` and is indexed on the
|
||||
``chunk id_hash``. It is used to determine whether we already have a specific
|
||||
chunk, to count references to it and also for statistics.
|
||||
It contains:
|
||||
|
||||
Here is the estimated memory usage of |project_name| when using those
|
||||
indexes.
|
||||
* reference count
|
||||
* size
|
||||
* encrypted/compressed size
|
||||
|
||||
Repository index
|
||||
40 bytes x N ~ 200MB (If a remote repository is
|
||||
used this will be allocated on the remote side)
|
||||
The repository index is stored in ``repo/index.%d`` and is indexed on the
|
||||
``chunk id_hash``. It is used to determine a chunk's location in the repository.
|
||||
It contains:
|
||||
|
||||
Chunk lookup index
|
||||
44 bytes x N ~ 220MB
|
||||
* segment (that contains the chunk)
|
||||
* offset (where the chunk is located in the segment)
|
||||
|
||||
File chunk cache
|
||||
probably 80-100 bytes x N ~ 400MB
|
||||
The repository index file is random access.
|
||||
|
||||
Hints are stored in a file (``repo/hints.%d``).
|
||||
It contains:
|
||||
|
||||
* version
|
||||
* list of segments
|
||||
* compact
|
||||
|
||||
hints and index can be recreated if damaged or lost using ``check --repair``.
|
||||
|
||||
The chunks cache and the repository index are stored as hash tables, with
|
||||
only one slot per bucket, but that spreads the collisions to the following
|
||||
buckets. As a consequence the hash is just a start position for a linear
|
||||
search, and if the element is not in the table the index is linearly crossed
|
||||
until an empty bucket is found.
|
||||
|
||||
When the hash table is almost full at 90%, its size is doubled. When it's
|
||||
almost empty at 25%, its size is halved. So operations on it have a variable
|
||||
complexity between constant and linear with low factor, and memory overhead
|
||||
varies between 10% and 300%.
|
||||
|
||||
|
||||
Indexes / Caches memory usage
|
||||
-----------------------------
|
||||
|
||||
Here is the estimated memory usage of |project_name|:
|
||||
|
||||
chunk_count ~= total_file_size / 65536
|
||||
|
||||
repo_index_usage = chunk_count * 40
|
||||
|
||||
chunks_cache_usage = chunk_count * 44
|
||||
|
||||
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
||||
|
||||
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
||||
= total_file_count * 240 + total_file_size / 400
|
||||
|
||||
All units are Bytes.
|
||||
|
||||
It is assuming every chunk is referenced exactly once and that typical chunk size is 64kiB.
|
||||
|
||||
If a remote repository is used the repo index will be allocated on the remote side.
|
||||
|
||||
E.g. backing up a total count of 1Mi files with a total size of 1TiB:
|
||||
|
||||
mem_usage = 1 * 2**20 * 240 + 1 * 2**40 / 400 = 2.8GiB
|
||||
|
||||
Note: there is a commandline option to switch off the files cache. You'll save
|
||||
some memory, but it will need to read / chunk all the files then.
|
||||
|
||||
In the above we assume 350GB of data that we divide on an average 64KB
|
||||
chunk size, so N is around 5.3 million.
|
||||
|
||||
Encryption
|
||||
----------
|
||||
|
||||
AES_ is used with CTR mode of operation (so no need for padding). A 64
|
||||
bits initialization vector is used, a `HMAC-SHA256`_ is computed
|
||||
on the encrypted chunk with a random 64 bits nonce and both are stored
|
||||
in the chunk. The header of each chunk is : ``TYPE(1)`` +
|
||||
``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``. Encryption and HMAC use
|
||||
two different keys.
|
||||
AES_ is used in CTR mode (so no need for padding). A 64bit initialization
|
||||
vector is used, a `HMAC-SHA256`_ is computed on the encrypted chunk with a
|
||||
random 64bit nonce and both are stored in the chunk.
|
||||
The header of each chunk is : ``TYPE(1)`` + ``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``.
|
||||
Encryption and HMAC use two different keys.
|
||||
|
||||
In AES CTR mode you can think of the IV as the start value for the
|
||||
counter. The counter itself is incremented by one after each 16 byte
|
||||
block. The IV/counter is not required to be random but it must NEVER be
|
||||
reused. So to accomplish this |project_name| initializes the encryption counter
|
||||
to be higher than any previously used counter value before encrypting
|
||||
new data.
|
||||
In AES CTR mode you can think of the IV as the start value for the counter.
|
||||
The counter itself is incremented by one after each 16 byte block.
|
||||
The IV/counter is not required to be random but it must NEVER be reused.
|
||||
So to accomplish this |project_name| initializes the encryption counter to be
|
||||
higher than any previously used counter value before encrypting new data.
|
||||
|
||||
To reduce payload size only 8 bytes of the 16 bytes nonce is saved in
|
||||
the payload, the first 8 bytes are always zeroes. This does not affect
|
||||
security but limits the maximum repository capacity to only 295
|
||||
exabytes (2**64 * 16 bytes).
|
||||
To reduce payload size, only 8 bytes of the 16 bytes nonce is saved in the
|
||||
payload, the first 8 bytes are always zeros. This does not affect security but
|
||||
limits the maximum repository capacity to only 295 exabytes (2**64 * 16 bytes).
|
||||
|
||||
Encryption keys are either a passphrase, passed through the
|
||||
``BORG_PASSPHRASE`` environment or prompted on the commandline, or
|
||||
stored in automatically generated key files.
|
||||
Encryption keys are either derived from a passphrase or kept in a key file.
|
||||
The passphrase is passed through the ``BORG_PASSPHRASE`` environment variable
|
||||
or prompted for interactive usage.
|
||||
|
||||
Key files
|
||||
---------
|
||||
|
|
@ -274,22 +311,20 @@ enc_key
|
|||
the key used to encrypt data with AES (256 bits)
|
||||
|
||||
enc_hmac_key
|
||||
the key used to HMAC the resulting AES-encrypted data (256 bits)
|
||||
the key used to HMAC the encrypted data (256 bits)
|
||||
|
||||
id_key
|
||||
the key used to HMAC the above chunks, the resulting hash is
|
||||
stored out of band (256 bits)
|
||||
the key used to HMAC the plaintext chunk data to compute the chunk's id
|
||||
|
||||
chunk_seed
|
||||
the seed for the buzhash chunking table (signed 32 bit integer)
|
||||
|
||||
Those fields are processed using msgpack_. The utf-8 encoded phassphrase
|
||||
is encrypted with PBKDF2_ and SHA256_ using 100000 iterations and a
|
||||
random 256 bits salt to give us a derived key. The derived key is 256
|
||||
bits long. A `HMAC-SHA256`_ checksum of the above fields is generated
|
||||
with the derived key, then the derived key is also used to encrypt the
|
||||
above pack of fields. Then the result is stored in a another msgpack_
|
||||
formatted as follows:
|
||||
Those fields are processed using msgpack_. The utf-8 encoded passphrase
|
||||
is processed with PBKDF2_ (SHA256_, 100000 iterations, random 256 bit salt)
|
||||
to give us a derived key. The derived key is 256 bits long.
|
||||
A `HMAC-SHA256`_ checksum of the above fields is generated with the derived
|
||||
key, then the derived key is also used to encrypt the above pack of fields.
|
||||
Then the result is stored in a another msgpack_ formatted as follows:
|
||||
|
||||
version
|
||||
currently always an integer, 1
|
||||
|
|
@ -315,3 +350,9 @@ The resulting msgpack_ is then encoded using base64 and written to the
|
|||
key file, wrapped using the standard ``textwrap`` module with a header.
|
||||
The header is a single line with a MAGIC string, a space and a hexadecimal
|
||||
representation of the repository id.
|
||||
|
||||
|
||||
Compression
|
||||
-----------
|
||||
|
||||
Currently, zlib level 6 is used as compression.
|
||||
|
|
|
|||
116
docs/misc/create_chunker-params.txt
Normal file
116
docs/misc/create_chunker-params.txt
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
About borg create --chunker-params
|
||||
==================================
|
||||
|
||||
--chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE
|
||||
|
||||
CHUNK_MIN_EXP and CHUNK_MAX_EXP give the exponent N of the 2^N minimum and
|
||||
maximum chunk size. Required: CHUNK_MIN_EXP < CHUNK_MAX_EXP.
|
||||
|
||||
Defaults: 10 (2^10 == 1KiB) minimum, 23 (2^23 == 8MiB) maximum.
|
||||
|
||||
HASH_MASK_BITS is the number of least-significant bits of the rolling hash
|
||||
that need to be zero to trigger a chunk cut.
|
||||
Recommended: CHUNK_MIN_EXP + X <= HASH_MASK_BITS <= CHUNK_MAX_EXP - X, X >= 2
|
||||
(this allows the rolling hash some freedom to make its cut at a place
|
||||
determined by the windows contents rather than the min/max. chunk size).
|
||||
|
||||
Default: 16 (statistically, chunks will be about 2^16 == 64kiB in size)
|
||||
|
||||
HASH_WINDOW_SIZE: the size of the window used for the rolling hash computation.
|
||||
Default: 4095B
|
||||
|
||||
|
||||
Trying it out
|
||||
=============
|
||||
|
||||
I backed up a VM directory to demonstrate how different chunker parameters
|
||||
influence repo size, index size / chunk count, compression, deduplication.
|
||||
|
||||
repo-sm: ~64kiB chunks (16 bits chunk mask), min chunk size 1kiB (2^10B)
|
||||
(these are attic / borg 0.23 internal defaults)
|
||||
|
||||
repo-lg: ~1MiB chunks (20 bits chunk mask), min chunk size 64kiB (2^16B)
|
||||
|
||||
repo-xl: 8MiB chunks (2^23B max chunk size), min chunk size 64kiB (2^16B).
|
||||
The chunk mask bits was set to 31, so it (almost) never triggers.
|
||||
This degrades the rolling hash based dedup to a fixed-offset dedup
|
||||
as the cutting point is now (almost) always the end of the buffer
|
||||
(at 2^23B == 8MiB).
|
||||
|
||||
The repo index size is an indicator for the RAM needs of Borg.
|
||||
In this special case, the total RAM needs are about 2.1x the repo index size.
|
||||
You see index size of repo-sm is 16x larger than of repo-lg, which corresponds
|
||||
to the ratio of the different target chunk sizes.
|
||||
|
||||
Note: RAM needs were not a problem in this specific case (37GB data size).
|
||||
But just imagine, you have 37TB of such data and much less than 42GB RAM,
|
||||
then you'ld definitely want the "lg" chunker params so you only need
|
||||
2.6GB RAM. Or even bigger chunks than shown for "lg" (see "xl").
|
||||
|
||||
You also see compression works better for larger chunks, as expected.
|
||||
Duplication works worse for larger chunks, also as expected.
|
||||
|
||||
small chunks
|
||||
============
|
||||
|
||||
$ borg info /extra/repo-sm::1
|
||||
|
||||
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 10,23,16,4095 /extra/repo-sm::1 /home/tw/win
|
||||
Number of files: 3
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 37.12 GB 14.81 GB 12.18 GB
|
||||
All archives: 37.12 GB 14.81 GB 12.18 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 378374 487316
|
||||
|
||||
$ ls -l /extra/repo-sm/index*
|
||||
|
||||
-rw-rw-r-- 1 tw tw 20971538 Jun 20 23:39 index.2308
|
||||
|
||||
$ du -sk /extra/repo-sm
|
||||
11930840 /extra/repo-sm
|
||||
|
||||
large chunks
|
||||
============
|
||||
|
||||
$ borg info /extra/repo-lg::1
|
||||
|
||||
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,20,4095 /extra/repo-lg::1 /home/tw/win
|
||||
Number of files: 3
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 37.10 GB 14.60 GB 13.38 GB
|
||||
All archives: 37.10 GB 14.60 GB 13.38 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 25889 29349
|
||||
|
||||
$ ls -l /extra/repo-lg/index*
|
||||
|
||||
-rw-rw-r-- 1 tw tw 1310738 Jun 20 23:10 index.2264
|
||||
|
||||
$ du -sk /extra/repo-lg
|
||||
13073928 /extra/repo-lg
|
||||
|
||||
xl chunks
|
||||
=========
|
||||
|
||||
(borg-env)tw@tux:~/w/borg$ borg info /extra/repo-xl::1
|
||||
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,31,4095 /extra/repo-xl::1 /home/tw/win
|
||||
Number of files: 3
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 37.10 GB 14.59 GB 14.59 GB
|
||||
All archives: 37.10 GB 14.59 GB 14.59 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 4319 4434
|
||||
|
||||
$ ls -l /extra/repo-xl/index*
|
||||
-rw-rw-r-- 1 tw tw 327698 Jun 21 00:52 index.2011
|
||||
|
||||
$ du -sk /extra/repo-xl/
|
||||
14253464 /extra/repo-xl/
|
||||
|
||||
130
docs/misc/create_compression.txt
Normal file
130
docs/misc/create_compression.txt
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
data compression
|
||||
================
|
||||
|
||||
borg create --compression N repo::archive data
|
||||
|
||||
Currently, borg only supports zlib compression. There are plans to expand this
|
||||
to other, faster or better compression algorithms in the future.
|
||||
|
||||
N == 0 -> zlib level 0 == very quick, no compression
|
||||
N == 1 -> zlib level 1 == quick, low compression
|
||||
...
|
||||
N == 9 -> zlib level 9 == slow, high compression
|
||||
|
||||
Measurements made on a Haswell Ultrabook, SSD storage, Linux.
|
||||
|
||||
|
||||
Example 1: lots of relatively small text files (linux kernel src)
|
||||
-----------------------------------------------------------------
|
||||
|
||||
N == 1 does a good job here, it saves the additional time needed for
|
||||
compression because it needs to store less into storage (see N == 0).
|
||||
|
||||
N == 6 is also quite ok, a little slower, a little less repo size.
|
||||
6 was the old default of borg.
|
||||
|
||||
High compression levels only give a little more compression, but take a lot
|
||||
of cpu time.
|
||||
|
||||
$ borg create --stats --compression 0
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 50.40 seconds
|
||||
Number of files: 72890
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 1.17 GB 1.18 GB 1.01 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 70263 82309
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
$ borg create --stats --compression 1
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 49.29 seconds
|
||||
Number of files: 72890
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 1.17 GB 368.62 MB 295.22 MB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 70280 82326
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
$ borg create --stats --compression 5
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 59.99 seconds
|
||||
Number of files: 72890
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 1.17 GB 331.70 MB 262.20 MB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 70290 82336
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
$ borg create --stats --compression 6
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 1 minutes 13.64 seconds
|
||||
Number of files: 72890
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 1.17 GB 328.79 MB 259.56 MB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 70279 82325
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
$ borg create --stats --compression 9
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 3 minutes 1.58 seconds
|
||||
Number of files: 72890
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 1.17 GB 326.57 MB 257.57 MB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 70292 82338
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
Example 2: large VM disk file (sparse file)
|
||||
-------------------------------------------
|
||||
|
||||
The file's directory size is 80GB, but a lot of it is sparse (and reads as
|
||||
zeros).
|
||||
|
||||
$ borg create --stats --compression 0
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 13 minutes 48.47 seconds
|
||||
Number of files: 1
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 80.54 GB 80.55 GB 10.87 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 147307 177109
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
$ borg create --stats --compression 1
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 15 minutes 31.34 seconds
|
||||
Number of files: 1
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 80.54 GB 6.68 GB 5.67 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 147309 177111
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
$ borg create --stats --compression 6
|
||||
------------------------------------------------------------------------------
|
||||
Duration: 18 minutes 57.54 seconds
|
||||
Number of files: 1
|
||||
|
||||
Original size Compressed size Deduplicated size
|
||||
This archive: 80.54 GB 6.19 GB 5.44 GB
|
||||
|
||||
Unique chunks Total chunks
|
||||
Chunk index: 147307 177109
|
||||
------------------------------------------------------------------------------
|
||||
|
|
@ -50,6 +50,9 @@ Examples
|
|||
NAME="root-`date +%Y-%m-%d`"
|
||||
$ borg create /mnt/backup::$NAME / --do-not-cross-mountpoints
|
||||
|
||||
# Backup huge files with little chunk management overhead
|
||||
$ borg create --chunker-params 19,23,21,4095 /mnt/backup::VMs /srv/VMs
|
||||
|
||||
|
||||
.. include:: usage/extract.rst.inc
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue