borgbackup/borg
Thomas Waldmann 8834f6fdbd chunker: do not buzhash if not needed, fixes #1021
For small remainders of files (last chunk), we do not need to buzhash if it
is already clear that there is not enough left (we want at least min_size big
chunks).

Small files are handled by same code - as they only give 1 chunk, that is
the last chunk (see above).

See "Cases" considerations below.

For big files, we do not need to buzhash the first min_size bytes of a chunk -
we do not want to cut there anyway, so we can start buzhashing at offset
min_size.

Cases (before this change)
--------------------------

- A) remaining <= window_size

  - would do 2 chunker_fill calls (both line 253) and trigger eof with the 2nd call
  - no buzhashing
  - result is 1 <remaining> length chunk

- B) window_size < remaining <= min_size:

  - the chunker would do 1 chunker_fill call (line 253) that would read the entire remaining file (but not trigger eof yet)
  - would compute all possible remaining - window_size + 1 buzhashes, but without a chance for a cut,
    because there is also the n < min_size condition
  - would do another chunker_fill call (line 282), but not get more data, so loop ends
  - result is 1 <remaining> length chunk

- C) file > min_size:

  - normal chunking

Cases (after this change)
-------------------------

- A) similar to above A), but up to remaining < min_size + window_size + 1,
  so it does not buzhash if there is no chance for a cut.

- B) see C) above
2016-05-22 01:18:16 +02:00
..
testsuite chunker: do not buzhash if not needed, fixes #1021 2016-05-22 01:18:16 +02:00
__init__.py propperly handle borg._version using setuptools_scm 2015-08-22 15:54:40 +02:00
__main__.py cosmetic source cleanup (flake8) 2016-01-30 21:32:45 +01:00
_chunker.c chunker: do not buzhash if not needed, fixes #1021 2016-05-22 01:18:16 +02:00
_hashindex.c refcounting: use uint32_t, protect against overflows, fix merging for BE 2016-04-14 23:38:56 +02:00
archive.py move Statistics class to archive module, avoid cyclic import 2016-05-18 23:59:47 +02:00
archiver.py Merge branch '1.0-maint' 2016-05-20 22:48:57 +02:00
cache.py add a bin_to_hex helper and some properties 2016-04-23 22:42:56 +02:00
chunker.pyx chunker: do not buzhash if not needed, fixes #1021 2016-05-22 01:18:16 +02:00
compress.pyx remove misc. compat code not needed for py 3.4+ 2016-01-24 15:16:05 +01:00
constants.py Improve LoggedIO write performance, make commit mechanism more solid 2016-05-14 22:47:18 +02:00
crypto.pyx remove openssl RAND_bytes from crypto.pyx 2016-05-09 04:14:50 +02:00
fuse.py borg mount: cache partially read data chunks 2016-04-23 18:05:22 +02:00
hash_sizes.py hashtable size follows a growth policy, fixes #527 2016-01-14 14:39:59 +01:00
hashindex.pyx ChunkIndex.add: overwrite current (c)size w/ new values 2016-04-17 00:37:40 +02:00
helpers.py add bigint coding, allow None as user/group 2016-05-21 03:35:07 +02:00
item.py disallow setting unknown attributes, use StableDict as .as_dict() result 2016-05-21 03:35:07 +02:00
key.py Merge branch '1.0-maint' 2016-05-20 22:48:57 +02:00
locking.py do not sleep for >60s while waiting for lock, fixes #773 2016-03-19 21:19:30 +01:00
logger.py Update logging parser to allow remotes to pass logger name 2016-05-18 14:58:43 -04:00
lrucache.py Merge branch 'master' into lrucache 2015-08-14 10:59:21 +01:00
platform.py add swidth call to determine string width on terminal 2016-05-18 17:40:04 +02:00
platform_base.py add swidth call to determine string width on terminal 2016-05-18 17:40:04 +02:00
platform_darwin.pyx create new platform_posix module 2016-05-18 17:40:04 +02:00
platform_freebsd.pyx create new platform_posix module 2016-05-18 17:40:04 +02:00
platform_linux.pyx create new platform_posix module 2016-05-18 17:40:04 +02:00
platform_posix.pyx create new platform_posix module 2016-05-18 17:40:04 +02:00
remote.py Merge branch '1.0-maint' 2016-05-20 22:48:57 +02:00
repository.py Merge branch '1.0-maint' 2016-05-20 22:48:57 +02:00
selftest.py Add self tests 2016-04-28 00:06:19 +02:00
shellpattern.py Add shell-style pattern syntax 2016-01-21 16:07:24 +01:00
upgrader.py add a bin_to_hex helper and some properties 2016-04-23 22:42:56 +02:00
xattr.py Fix capabilities extraction on Linux 2016-04-16 23:52:27 +02:00