FileReader.read: detect zero-filled slices from CH_DATA blocks

When FileReader.read() sliced a large CH_DATA block (read at 1MB granularity) into smaller block_size chunks (e.g. 4096 bytes), zero-filled slices were returned as CH_DATA with zero bytes instead of CH_ALLOC. Add a zeros.startswith(result) check before returning a CH_DATA chunk, converting all-zero slices to CH_ALLOC. This ensures sparse-aware consumers correctly identify allocated-but-zero regions regardless of whether the file was read with sparse=True or sparse=False.
2026-06-11 01:41:57 -04:00 · 2026-05-08 23:58:54 +02:00 · 2026-05-08 23:58:54 +02:00 · 827b82938f
commit 827b82938f
parent 9b8cfc7d83
1 changed files with 6 additions and 1 deletions
--- a/src/borg/chunkers/reader.pyx
+++ b/src/borg/chunkers/reader.pyx
@ -327,7 +327,12 @@ class FileReader:

        # Determine the allocation type of the resulting chunk
        if has_data:
-            # If any chunk was CH_DATA, the result is CH_DATA
+            # If any chunk was CH_DATA, check if the result is all zeros.
+            # This can happen when a large CH_DATA block (read at read_size granularity)
+            # contains both real data and zero-filled regions, and we are slicing out
+            # a zero-filled portion at the block_size granularity.
+            if zeros.startswith(result):
+                return Chunk(None, size=bytes_read, allocation=CH_ALLOC)
            return Chunk(bytes(result), size=bytes_read, allocation=CH_DATA)
        elif has_hole:
            # If any chunk was CH_HOLE (and none were CH_DATA), the result is CH_HOLE