cmd/restic/cmd_rewrite.go:
introduction of include filters for this command:
- add include filters, add error checking code
- add new parameter 'keepEmptyDirectoryFunc' to 'walker.NewSnapshotSizeRewriter()',
so empty directories have to be kept to keep the directory structure intact
- add parameter 'keepEmptySnapshot' to 'filterAndReplaceSnapshot()' to keep snapshots
intact when nothing is to be included
- introduce helper function 'gatherIncludeFilters()' and 'gatherExcludeFilters()' to
keep code flow clean
cmd/restic/cmd_rewrite_integration_test.go:
add several new tests around the 'include' functionality
internal/filter/include.go:
this is where is include filter is defined
internal/walker/rewriter.go:
- struct RewriteOpts gains field 'KeepEmtpyDirectory', which is a 'NodeKeepEmptyDirectoryFunc()'
which defaults to nil, so that al subdirectories are kept
- function 'NewSnapshotSizeRewriter()' gains the parameter 'keepEmptyDirecoryFilter' which
controls the management of empty subdirectories in case of include filters active
internal/data/tree.go:
gains a function Count() for checking the number if node elements in a newly built tree
internal/walker/rewriter_test.go:
function 'NewSnapshotSizeRewriter()' gets an additional parameter nil to keeps things happy
cmd/restic/cmd_repair_snapshots.go:
function 'filterAndReplaceSnapshot()' gets an additional parameter 'keepEmptySnapshot=nil'
doc/045_working_with_repos.rst:
gets to mention include filters
changelog/unreleased/issue-4278:
the usual announcement file
git rebase master -i produced this
restic rewrite include - keep linter happy
cmd/restic/cmd_rewrite_integration_test.go:
linter likes strings.Contain() better than my strings.Index() >= 0
The TreeNodeIterator decodes nodes while iterating over a tree blob.
This should reduce peak memory usage as now only the serialized tree
blob and a single node have to alive at the same time. Using the
iterator has implications for the error handling however. Now it is
necessary that all loops that iterate through a tree check for errors
before using the node returned by the iterator.
The other change is that it is no longer possible to iterate over a tree
multiple times. Instead it must be loaded a second time. This only
affects the tree rewriting code.
Always serialize trees via TreeJSONBuilder. Add a wrapper called
TreeWriter which combines serialization and saving the tree blob in the
repository. In the future, TreeJSONBuilder will have to upload tree
chunks while the tree is still serialized. This will a wrapper like
TreeWriter, so add it right now already.
The archiver.treeSaver still directly uses the TreeJSONBuilder as it
requires special handling.
It was only used in two places:
- stats: apparently as a minor performance optimization, which is
unlikely to be important
- find: filtered directories would be ignored. However, this
optimization missed that it is possible that two directories have the
exact same content. Such directories would be incorrectly ignored too.
Example:
```
mkdir test test/a test/b
restic backup test
restic find latest test/b
-> incorrectly does not return anything
```
Thus, remove the functionality as it's apparently too complex to use
correctly.
This adds support for caching already rewritten trees, handling of load
errors and disabling the check that the serialization doesn't lead to
data loss.
The more generic RewriteNode callback replaces the SelectByName and
PrintExclude functions. The main part of this change is a preparation to
allow using the TreeRewriter for the `repair snapshots` command.
In principle, the JSON format of Tree objects is extensible without
requiring a format change. In order to not loose information just play
it safe and reject rewriting trees for which we could loose data.
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.
Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
Load tree blobs with more than 50MB only from a single goroutine. Very
large tree blobs with for example 400 MB size can otherwise require
roughly 1GB * streamTreeParallelism memory.