FileReader.read: detect zero-filled slices from CH_DATA blocks

When FileReader.read() sliced a large CH_DATA block (read at 1MB
granularity) into smaller block_size chunks (e.g. 4096 bytes), zero-filled
slices were returned as CH_DATA with zero bytes instead of CH_ALLOC.

Add a zeros.startswith(result) check before returning a CH_DATA chunk,
converting all-zero slices to CH_ALLOC. This ensures sparse-aware
consumers correctly identify allocated-but-zero regions regardless of
whether the file was read with sparse=True or sparse=False.
This commit is contained in:
Thomas Waldmann 2026-05-08 23:58:54 +02:00
parent 9b8cfc7d83
commit 827b82938f
No known key found for this signature in database
GPG key ID: 243ACFA951F78E01

View file

@ -327,7 +327,12 @@ class FileReader:
# Determine the allocation type of the resulting chunk
if has_data:
# If any chunk was CH_DATA, the result is CH_DATA
# If any chunk was CH_DATA, check if the result is all zeros.
# This can happen when a large CH_DATA block (read at read_size granularity)
# contains both real data and zero-filled regions, and we are slicing out
# a zero-filled portion at the block_size granularity.
if zeros.startswith(result):
return Chunk(None, size=bytes_read, allocation=CH_ALLOC)
return Chunk(bytes(result), size=bytes_read, allocation=CH_DATA)
elif has_hole:
# If any chunk was CH_HOLE (and none were CH_DATA), the result is CH_HOLE