Commit graph

18 commits

Author SHA1 Message Date
Thomas Waldmann
5cb47cbedd hashindex: explain hash_sizes 2016-01-14 14:39:59 +01:00
Thomas Waldmann
083f5e31ef hashindex: fix upper limit
use num_buckets (== fully use what we currently have allocated)
2016-01-14 14:39:59 +01:00
Thomas Waldmann
09665805e8 move func defs to avoid implicit declaration compiler warning 2016-01-14 14:39:59 +01:00
Thomas Waldmann
91cde721b4 hashindex: minor refactor
- rename BUCKET_(LOWER|UPPER)_LIMIT to HASH_(MIN|MAX)_LOAD
   as this value is usually called the hash table's minimum/maximum load factor.
- remove MAX_BUCKET_SIZE (not used)
- regroup/reorder definitions
2016-01-14 14:39:59 +01:00
Thomas Waldmann
d88df3edc6 hashtable size follows a growth policy, fixes #527
also: refactor / dedupe some code into functions
2016-01-14 14:39:59 +01:00
Thomas Waldmann
720fc49498 hashindex_add C implementation
this was also the loop contents of hashindex_merge, but we also need it callable from Cython/Python code.

this saves some cycles, esp. if the key is already present in the index.
2015-12-07 19:13:58 +01:00
Thomas Waldmann
610300c1ce misc. hash table tuning
BUCKET_UPPER_LIMIT: 90% load degrades hash table performance severely,
so I lowered that to 75% (which is a usual value - java uses 75%, python uses 66%).
I chose the higher value of both because we also should not consume too much
memory, considering the RAM usage already is rather high.

MIN_BUCKETS: I can't explain why, but benchmarks showed that choosing 2^N as
table size severely degrades performance (by 3 orders of magnitude!). So a prime
start value improves this a lot, even if we later still use the grow-by-2x algorithm.

hashindex_resize: removed the hashindex_get() call as we already know that the values
come at key + key_size address.

hashindex_init: do not calloc X*Y elements of size 1, but rather X elements of size Y.
Makes the code simpler, not sure if it affects performance.

The tests needed fixing as the resulting hashtable blob is now of course different due
to the above changes, so its sha hash changed.
2015-12-01 21:18:58 +01:00
Thomas Waldmann
7247043db0 get rid of C compiler warnings, fixes #391 2015-11-21 22:08:30 +01:00
Thomas Waldmann
d779057b79 fix issue with negative "all archives" size, fixes #165
This fixes a infrequent problem when (refcount * chunksize) overflowed a int32_t.
chunksize is always <= 8MiB and usually rather ~64KiB (with default chunker params).
Thus, this happened only for high refcounts and/or unusually big chunks.
2015-08-29 04:46:13 +02:00
Thomas Waldmann
e06b0b3612 use C99's uintmax_t and %ju format
whatever size_t and off_t is, should even fit in there
2015-08-12 01:04:03 +02:00
Thomas Waldmann
197ca9c0d3 C merge code: cast to correct pointer type, silences warning 2015-08-09 16:19:53 +02:00
Thomas Waldmann
a1e039ba21 reimplement the chunk index merging in C
the python code could take a rather long time and likely most of it was converting stuff from python to C and back.
2015-08-06 23:32:53 +02:00
Thomas Waldmann
6d0a00496a determine and report chunk counts in chunks index
borg info repo::archive now reports unique chunks count, total chunks count

also: use index->key_size instead of hardcoded value
2015-06-19 23:53:23 +02:00
Thomas Waldmann
614261604e don't hardcode MAGIC length 2015-06-02 02:41:23 +02:00
Thomas Waldmann
926454c0d8 explicitely specify binary mode to open binary files
on POSIX OSes, it doesn't make a difference, but it is cleaner and also good for portability.
2015-05-31 17:57:45 +02:00
Thomas Waldmann
776bb9fabc hashindex: improve error messages 2015-05-31 17:48:19 +02:00
Thomas Waldmann
91e10fec5f Merge branch 'master' of github.com:jborg/attic 2015-05-31 17:37:02 +02:00
Thomas Waldmann
78bfc58b47 rename package directory to borg 2015-05-22 17:48:54 +02:00
Renamed from attic/_hashindex.c (Browse further)