diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index e9639fa7f..dc09fa326 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -158,9 +158,11 @@ such obsolete entries is called sparse, while a segment containing no such entri Since writing a ``DELETE`` tag does not actually delete any data and thus does not free disk space any log-based data store will need a compaction strategy (somewhat analogous to a garbage collector). + Borg uses a simple forward compacting algorithm, which avoids modifying existing segments. -Compaction runs when a commit is issued (unless the :ref:`append_only_mode` is active). +Compaction runs when a commit is issued with ``compact=True`` parameter, e.g. +by the ``borg compact`` command (unless the :ref:`append_only_mode` is active). One client transaction can manifest as multiple physical transactions, since compaction is transacted, too, and Borg does not distinguish between the two:: @@ -197,9 +199,9 @@ The 1.1.x series writes version 2 of the format and reads either version. When reading a version 1 hints file, Borg 1.1.x will read all sparse segments to determine their sparsity. -This process may take some time if a repository is kept in the append-only mode, -which causes the number of sparse segments to grow. Repositories not in append-only -mode have no sparse segments in 1.0.x, since compaction is unconditional. +This process may take some time if a repository has been kept in append-only mode +or ``borg compact`` has not been used for a longer time, which both has caused +the number of sparse segments to grow. Compaction processes sparse segments from oldest to newest; sparse segments which don't contain enough deleted data to justify compaction are skipped. This diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 7d100e56d..70176dbfa 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -59,7 +59,7 @@ Also helpful: - if you use LVM: use a LV + a filesystem that you can resize later and have some unallocated PEs you can add to the LV. - consider using quotas -- use `prune` regularly +- use `prune` and `compact` regularly .. [1] This failsafe can fail in these circumstances: @@ -105,8 +105,10 @@ Some files which aren't necessarily needed in this backup are excluded. See :ref:`borg_patterns` on how to add more exclude options. After the backup this script also uses the :ref:`borg_prune` subcommand to keep -only a certain number of old archives and deletes the others in order to preserve -disk space. +only a certain number of old archives and deletes the others. + +Finally, it uses the :ref:`borg_compact` subcommand to remove deleted objects +from the segment files in the repository to preserve disk space. Before running, make sure that the repository is initialized as documented in :ref:`remote_repos` and that the script has the correct permissions to be executable @@ -176,17 +178,24 @@ backed up and that the ``prune`` command is keeping and deleting the correct bac prune_exit=$? + # actually free repo disk space by compacting segments + + borg compact + + compact_exit=$? + # use highest exit code as global exit code global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit )) + global_exit=$(( compact_exit > global_exit ? compact_exit : global_exit )) if [ ${global_exit} -eq 1 ]; then - info "Backup and/or Prune finished with a warning" + info "Backup, Prune and/or Compact finished with a warning" fi if [ ${global_exit} -gt 1 ]; then - info "Backup and/or Prune finished with an error" + info "Backup, Prune and/or Compact finished with an error" fi exit ${global_exit} diff --git a/docs/usage/delete.rst b/docs/usage/delete.rst index bf3e2f54b..02fe0a04a 100644 --- a/docs/usage/delete.rst +++ b/docs/usage/delete.rst @@ -6,6 +6,8 @@ Examples # delete a single backup archive: $ borg delete /path/to/repo::Monday + # actually free disk space: + $ borg compact /path/to/repo # delete all archives whose names begin with the machine's hostname followed by "-" $ borg delete --prefix '{hostname}-' /path/to/repo diff --git a/docs/usage/notes.rst b/docs/usage/notes.rst index e5a912bfa..4e190c213 100644 --- a/docs/usage/notes.rst +++ b/docs/usage/notes.rst @@ -148,16 +148,51 @@ Now, let's see how to restore some LVs from such a backup. :: $ borg extract --stdout /path/to/repo::arch dev/vg0/home-snapshot > /dev/vg0/home +.. _separate_compaction: + +Separate compaction +~~~~~~~~~~~~~~~~~~~ + +Borg does not auto-compact the segment files in the repository at commit time +(at the end of each repository-writing command) any more. + +This is new since borg 1.2.0 and requires borg >= 1.2.0 on client and server. + +This causes a similar behaviour of the repository as if it was in append-only +mode (see below) most of the time (until ``borg compact`` is invoked or an +old client triggers auto-compaction). + +This has some notable consequences: + +- repository space is not freed immediately when deleting / pruning archives +- commands finish quicker +- repository is more robust and might be easier to recover after damages (as + it contains data in a more sequential manner, historic manifests, multiple + commits - until you run ``borg compact``) +- user can choose when to run compaction (it should be done regularly, but not + neccessarily after each single borg command) +- user can choose from where to invoke ``borg compact`` to do the compaction + (from client or from server, it does not need a key) +- less repo sync data traffic in case you create a copy of your repository by + using a sync tool (like rsync, rclone, ...) + +You can manually run compaction by invoking the ``borg compact`` command. + .. _append_only_mode: -Append-only mode -~~~~~~~~~~~~~~~~ +Append-only mode (forbid compaction) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A repository can be made "append-only", which means that Borg will never overwrite or -delete committed data (append-only refers to the segment files, but borg will also -reject to delete the repository completely). This is useful for scenarios where a -backup client machine backups remotely to a backup server using ``borg serve``, since -a hacked client machine cannot delete backups on the server permanently. +A repository can be made "append-only", which means that Borg will never +overwrite or delete committed data (append-only refers to the segment files, +but borg will also reject to delete the repository completely). + +If ``borg compact`` command is used on a repo in append-only mode, there +will be no warning or error, but no compaction will happen. + +append-only is useful for scenarios where a backup client machine backups +remotely to a backup server using ``borg serve``, since a hacked client machine +cannot delete backups on the server permanently. To activate append-only mode, set ``append_only`` to 1 in the repository config:: diff --git a/docs/usage/prune.rst b/docs/usage/prune.rst index 028f83004..6b0ac84b1 100644 --- a/docs/usage/prune.rst +++ b/docs/usage/prune.rst @@ -23,6 +23,8 @@ first so you will see what it would do without it actually doing anything. # Same as above but only apply to archive names starting with the hostname # of the machine followed by a "-" character: $ borg prune -v --list --keep-daily=7 --keep-weekly=4 --prefix='{hostname}-' /path/to/repo + # actually free disk space: + $ borg compact /path/to/repo # Keep 7 end of day, 4 additional end of week archives, # and an end of month archive for every month: diff --git a/src/borg/archiver.py b/src/borg/archiver.py index f3b58c4c2..0dd8b37c7 100644 --- a/src/borg/archiver.py +++ b/src/borg/archiver.py @@ -2311,6 +2311,7 @@ class Archiver: # It will replace the entire :ref:`foo` verbatim. rst_plain_text_references = { 'a_status_oddity': '"I am seeing ‘A’ (added) status for a unchanged file!?"', + 'separate_compaction': '"Separate compaction"', } def process_epilog(epilog): @@ -3220,9 +3221,13 @@ class Archiver: delete_epilog = process_epilog(""" This command deletes an archive from the repository or the complete repository. - Disk space is reclaimed accordingly. If you delete the complete repository, the - local cache for it (if any) is also deleted. Alternatively, you can delete just - the local cache with the ``--cache-only`` option. + + Important: When deleting archives, repository disk space is **not** freed until + you run ``borg compact``. + + If you delete the complete repository, the local cache for it (if any) is + also deleted. Alternatively, you can delete just the local cache with the + ``--cache-only`` option. When using ``--stats``, you will get some statistics about how much data was deleted - the "Deleted data" deduplicated size there is most interesting as @@ -3376,8 +3381,12 @@ class Archiver: prune_epilog = process_epilog(""" The prune command prunes a repository by deleting all archives not matching - any of the specified retention options. This command is normally used by - automated backup scripts wanting to keep a certain number of historic backups. + any of the specified retention options. + + Important: Repository disk space is **not** freed until you run ``borg compact``. + + This command is normally used by automated backup scripts wanting to keep a + certain number of historic backups. Also, prune automatically removes checkpoint archives (incomplete archives left behind by interrupted backup runs) except if the checkpoint is the latest @@ -3564,6 +3573,8 @@ class Archiver: This is an *experimental* feature. Do *not* use this on your only backup. + Important: Repository disk space is **not** freed until you run ``borg compact``. + ``--exclude``, ``--exclude-from``, ``--exclude-if-present``, ``--keep-exclude-tags``, and PATH have the exact same semantics as in "borg create". If PATHs are specified the resulting archive will only contain files from these PATHs. @@ -3592,10 +3603,9 @@ class Archiver: With ``--target`` the original archive is not replaced, instead a new archive is created. - When rechunking space usage can be substantial, expect at least the entire - deduplicated size of the archives using the previous chunker params. - When recompressing expect approx. (throughput / checkpoint-interval) in space usage, - assuming all chunks are recompressed. + When rechunking (or recompressing), space usage can be substantial - expect + at least the entire deduplicated size of the archives using the previous + chunker (or compression) params. If you recently ran borg check --repair and it had to fix lost chunks with all-zero replacement chunks, please first run another backup for the same data and re-run @@ -3697,6 +3707,16 @@ class Archiver: compact_epilog = process_epilog(""" This command frees repository space by compacting segments. + + Use this regularly to avoid running out of space - you do not need to use this + after each borg command though. + + borg compact does not need a key, so it is possible to invoke it from the + client or also from the server. + + Depending on the amount of segments that need compaction, it may take a while. + + See :ref:`separate_compaction` in Additional Notes for more details. """) subparser = subparsers.add_parser('compact', parents=[common_parser], add_help=False, description=self.do_compact.__doc__,