added a check that compares the size of the new chunk with the stored size of the
already existing chunk in storage that has the same id_hash value.
raise an exception if there is a size mismatch.
this could happen if:
- the stored size is somehow incorrect (corruption or software bug)
- we found a hash collision for the id_hash (for sha256, this is very unlikely)
This fix is maybe not perfect yet, but maybe better than nothing.
A comment by Ernest0x (see https://github.com/jborg/attic/issues/232 ):
@ThomasWaldmann your patch did the job.
attic check --repair did the repairing and attic delete deleted the archive.
Thanks.
That said, however, I am not sure if the best place to put the check is where
you put it in the patch. For example, the check operation uses a custom msgpack
unpacker class named "RobustUnpacker", which it does try to check for correct
format (see the comment: "Abort early if the data does not look like a
serialized dict"), but it seems it does not catch my case. The relevant code
in 'cache.py', on the other hand, uses msgpack's Unpacker class.
Re-synchronize chunks cache with repository.
If present, uses a compressed tar archive of known backup archive
indices, so it only needs to fetch infos from repo and build a chunk
index once per backup archive.
If out of sync, the tar gets rebuilt from known + fetched chunk infos,
so it has complete and current information about all backup archives.
Finally, it builds the master chunks index by merging all indices from
the tar.
Note: compression (esp. xz) is very effective in keeping the tar
relatively small compared to the files it contains.
Use python >= 3.3 to get better compression with xz,
there's a fallback to bz2 or gz when xz is not supported.