* sometimes, the report function may absorb the error and return nil, in those cases the bar.Add(1) method would execute even if the file deletion had failed
* feat(backends/s3): add warmup support before repacks and restores
This commit introduces basic support for transitioning pack files stored
in cold storage to hot storage on S3 and S3-compatible providers.
To prevent unexpected behavior for existing users, the feature is gated
behind new flags:
- `s3.enable-restore`: opt-in flag (defaults to false)
- `s3.restore-days`: number of days for the restored objects to remain
in hot storage (defaults to `7`)
- `s3.restore-timeout`: maximum time to wait for a single restoration
(default to `1 day`)
- `s3.restore-tier`: retrieval tier at which the restore will be
processed. (default to `Standard`)
As restoration times can be lengthy, this implementation preemptively
restores selected packs to prevent incessant restore-delays during
downloads. This is slightly sub-optimal as we could process packs
out-of-order (as soon as they're transitioned), but this would really
add too much complexity for a marginal gain in speed.
To maintain simplicity and prevent resources exhautions with lots of
packs, no new concurrency mechanisms or goroutines were added. This just
hooks gracefully into the existing routines.
**Limitations:**
- Tests against the backend were not written due to the lack of cold
storage class support in MinIO. Testing was done manually on
Scaleway's S3-compatible object storage. If necessary, we could
explore testing with LocalStack or mocks, though this requires further
discussion.
- Currently, this feature only warms up before restores and repacks
(prune/copy), as those are the two main use-cases I came across.
Support for other commands may be added in future iterations, as long
as affected packs can be calculated in advance.
- The feature is gated behind a new alpha `s3-restore` feature flag to
make it explicit that the feature is still wet behind the ears.
- There is no explicit user notification for ongoing pack restorations.
While I think it is not necessary because of the opt-in flag, showing
some notice may improve usability (but would probably require major
refactoring in the progress bar which I didn't want to start). Another
possibility would be to add a flag to send restores requests and fail
early.
See https://github.com/restic/restic/issues/3202
* ui: warn user when files are warming up from cold storage
* refactor: remove the PacksWarmer struct
It's easier to handle multiple handles in the backend directly, and it
may open the door to reducing the number of requests made to the backend
in the future.
The size comparison for `--max-unused` only accounted for unused but not
for duplicate data. For repositories with a large amount of duplicates
this can result in a situation where no data gets pruned even though
the amount of unused data is much higher than specified.
Those methods now only allow modifying snapshots. Internal data types
used by the repository are now read-only. The repository-internal code
can bypass the restrictions by wrapping the repository in an
`internalRepository` type.
The restriction itself is implemented by using a new datatype
WriteableFileType in the SaveUnpacked and RemoveUnpacked methods. This
statically ensures that code cannot bypass the access restrictions.
The test changes are somewhat noisy as some of them modify repository
internals and therefore require some way to bypass the access
restrictions. This works by capturing an `internalRepository` or
`Backend` when creating the Repository using a test helper function.
* Refactor ea and sd helpers to use go-winio
Import go-winio and instead of copying the functions to encode/decode extended attributes and enable process privileges for security descriptors, call the functions defined in go-winio.
* ui: restore --delete indicates number of deleted files
* adds new field `FilesDeleted` to the State struct, JSON and text progress updaters
* increment FilesDeleted count when ReportedDeletedFile
* ui: collect the files to be deleted, delete, then update the count post deletion
* docs: update scripting output fields for restore command
ui: report deleted directories and refactor function name to ReportDeletion
Only the `Sys()` value from os.FileInfo is kept as field `sys` to
support Windows. The os.FileInfo removal ensures that for values like
`ModTime` that existed in both data structures there's no more confusion
which value is actually used.
Depending on the change packages, the VSS tests from ./cmd/restic and
the fs package may overlap in time. This causes the snapshot creation to
fail. Add retries in that case.
The actual implementation still relies on file paths, but with the
abstraction layer in place, an FS implementation can ensure atomic file
accesses in the future.
Extended attributes and security descriptors apparently cannot be
retrieved from a vss volume. Fix the volume check to correctly detect
vss volumes and just completely disable extended attributes for volumes.
Paths that only contain the volume shadow copy snapshot name require
special treatment. These paths must end with a slash for regular file
operations to work.
Previously, nodeRestoreTimestamps would do something like
if node.Type == restic.NodeTypeSymlink {
return nodeRestoreSymlinkTimestamps(...)
}
return syscall.UtimesNano(...)
where nodeRestoreSymlinkTimestamps was either a no-op or a
reimplementation of syscall.UtimesNano that handles symlinks, with some
repeated converting between timestamp types. The Linux implementation
was a bit clumsy, requiring three syscalls to set the timestamps.
In this new setup, there is a function utimesNano that has three
implementations:
* on Linux, it's a modified syscall.UtimesNano that uses
AT_SYMLINK_NOFOLLOW and AT_FDCWD so it can handle any type in a single
call;
* on other Unix platforms, it just calls the syscall function after
skipping symlinks;
* on Windows, it's the modified UtimesNano that was previously called
nodeRestoreSymlinkTimestamps, except with different arguments.
Previously, NodeFromFileInfo used the original file path to create the
node, which also meant that extended metadata was read from there
instead of within the vss snapshot.
Add two new test cases, TestBackendAzureAccountToken and
TestBackendAzureContainerToken, that ensure that the authorization using
both types of token works.
This introduces two new environment variables,
RESTIC_TEST_AZURE_ACCOUNT_SAS and RESTIC_TEST_AZURE_CONTAINER_SAS, that
contain the tokens to use when testing restic. If an environment
variable is missing, the related test is skipped.
Ignore AuthorizationFailure caused by using a container level SAS/SAT
token when calling GetProperties during the Create() call. This is because the
GetProperties call expects an Account Level token, and the container
level token simply lacks the appropriate permissions. Supressing the
Authorization Failure is OK, because if the token is actually invalid,
this is caught elsewhere when we try to actually use the token to do
work.
This does not produce exactly the same messages, as it inserts newlines
instead of "; ". But given how long our error messages can be, that
might be a good thing.
One place where IDSet.Clone is useful was reinventing it, using a
conversion to list, a sort, and a conversion back to map.
Also, use the stdlib "maps" package to implement as much of IDSet as
possible. This requires changing one caller, which assumed that cloning
nil would return a non-nil IDSet.
This changes Dumper.writeNode to spawn loader goroutines as needed
instead of as a pool. The code is shorter, fewer goroutines are spawned
for small files, and crash dumps (also for unrelated errors) should be
smaller.
A particular node should always be represented by a single instance.
This is necessary to allow the fuse library to assign a stable nodeId to
a node. macOS Sonoma trips over the previous, unstable behavior when
using fuse-t.
Now, a snapshot is only marked as oldest if it's the last in the list AND its values matches the last seen value for that bucket.
Also, updated the corresponding golden files for the tests.
Depending on parameters the paths in a snapshot do not directly
correspond to real paths on the filesystem. Therefore, reject funcs must
use the FS interface to work correctly.
The temp files used by the packer manager are either delete after
creation (unix) or marked as delete on close (windows). Thus, no
explicit cleanup is necessary.
The retry code path did not filter `ERROR_NOT_SUPPORTED`. Just call the
original function a second time to correctly follow the low privilege
code path.
Calling `Load()` twice for an atomic variable can return different
values each time. This resulted in trying to read the security
descriptor with high privileges, but then not entering the code path to
switch to low privileges when another thread has already done so
concurrently.
Failed locking attempts were immediately retried up to three times
without any delay between the retries. If a lock file is not found while
checking for other locks, with the reworked backend retries there is no
delay between those retries. This is a problem if a backend requires a
few seconds to reflect file deletions in the file listings. To work
around this problem, introduce a short exponentially increasing delay
between the retries. The number of retries is now increased to 4. This
results in delays of 5, 10 and 20 seconds between the retries.
When the context used for a load operation is canceled, then the result
is always an error independent of whether the file could be retrieved
from the backend. Do not false positively trip the circuit breaker in
this case.
The old behavior was problematic when trying to lock a repository. When
`Lock.checkForOtherLocks` listed multiple lock files in parallel and one
of them fails to load, then all other loads were canceled. This
cancelation was remembered by the circuit breaker, such that locking
retries would fail.
The HTTP client can only retry HTTP2 requests after receiving a GOAWAY
response if it can rewind the body. As we use a custom data type,
explicitly provide an implementation of `GetBody`.
* removes files which are no longer in the repository, including index files, snapshot files and pack files from the cache.
cache: fix ids set initialisation with NewIDSet()
Files were not included in the backup if the extended metadata for the
file could not be read. This is rather drastic. Instead settle on
returning a warning but still including the file in the backup.
This keeps backwards compatibility with the previous empty structs.
And maybe we'd want to put other fields into the inner struct later,
rather than the outer message.
Previously, an error JSON fragment would look like:
{"message_type": "error", "error": {}}
This is because encoding/json cannot marshal an error interface.
Instead, we now call .Error() to get the string value.