While not planned, it's also not completely impossible that a tree node
might get additional top-level fields. As the tree iterator is built
with a strict expectation of the top-level fields, this would result in
a parsing error. Future-proof the code by simply skipping unknown
fields.
The previous implementation stored the whole tree in a map and used it
for checking overlap between trees. This is now replaced with the
DualTreeIterator, which iterates over two trees in parallel and returns
the merge stream in order. In case of overlap between both trees, it
returns both nodes at the same time. Otherwise, only a single node is
returned.
The TreeNodeIterator decodes nodes while iterating over a tree blob.
This should reduce peak memory usage as now only the serialized tree
blob and a single node have to alive at the same time. Using the
iterator has implications for the error handling however. Now it is
necessary that all loops that iterate through a tree check for errors
before using the node returned by the iterator.
The other change is that it is no longer possible to iterate over a tree
multiple times. Instead it must be loaded a second time. This only
affects the tree rewriting code.
The tree.Nodes will be replaced by an iterator to loads and serializes
tree node ondemand. Thus, the processing moves from StreamTrees into the
callback. Schedule them onto the workers used by StreamTrees for proper
load distribution.
Always serialize trees via TreeJSONBuilder. Add a wrapper called
TreeWriter which combines serialization and saving the tree blob in the
repository. In the future, TreeJSONBuilder will have to upload tree
chunks while the tree is still serialized. This will a wrapper like
TreeWriter, so add it right now already.
The archiver.treeSaver still directly uses the TreeJSONBuilder as it
requires special handling.
data.TestCreateSnapshot which is used in particular by TestFindUsedBlobs
and TestFindUsedBlobs could generate trees with duplicate file names.
This is invalid and going forward will result in an error.
The new method combines both step into a single wrapper function. Thus
it ensures that both are always called in pairs. As an additional
benefit this slightly reduces the boilerplate to upload blobs.