Removing Snapshots
22 Aug 2016The feature that was most often requested feature for restic is the ability to remove snapshots from the repository. Sometimes, restic was even (rightfully) criticised for not having such a function.
After about three months of work, PR #518
was merged into the master branch a few days ago. This pull request brings two
new commands to restic: forget
and prune
, which allows you to not only
remove a single snapshot manually, but rather specify a policy according to
which restic should automatically remove snapshots (so you don’t have to bother
with them). The remainder of this post will give a short introduction on how
you can use the new commands to implement your own strategy for limiting growth
of the restic repository.
For all of the following commands the repository location and password have
been written to the environment variables RESTIC_REPOSITORY
and
RESTIC_PASSWORD
so that the commands can be run directly. This is how to do
it:
$ export RESTIC_REPOSITORY=/tmp RESTIC_PASSWORD=geheim
Please note that this feature is not yet contained in any released version of restic, you need to compile the code from the current (as of 22 August 2016) master branch yourself.
Removing a single snapshot
Let’s suppose you have a restic repository and ran a backup at 5:00 o’clock in the morning each day this year.
Running the snapshots
command shows you around 235 snapshots:
$ restic snapshots
ID Date Host Directory
----------------------------------------------------------------------
6e001a58 2016-01-01 04:00:00 mopped /home/fd0/tmp/data
62a86121 2016-01-02 04:00:00 mopped /home/fd0/tmp/data
ee891602 2016-01-03 04:00:00 mopped /home/fd0/tmp/data
[...]
d0267dbb 2016-08-20 05:00:00 mopped /home/fd0/tmp/data
6261b96f 2016-08-21 05:00:00 mopped /home/fd0/tmp/data
5116cfdc 2016-08-22 05:00:00 mopped /home/fd0/tmp/data
The forget
command allows removing snapshots. When a snapshot ID like
6e001a58
for the first snapshot made on 1 January 2016 is specified as the
argument of the command, that snapshot is deleted from the repository:
$ restic forget 6e001a58
removed snapshot 6e001a58
$ restic snapshots
ID Date Host Directory
----------------------------------------------------------------------
62a86121 2016-01-02 04:00:00 mopped /home/fd0/tmp/data
ee891602 2016-01-03 04:00:00 mopped /home/fd0/tmp/data
a07ffe20 2016-01-04 04:00:00 mopped /home/fd0/tmp/data
[...]
The snapshot in a restic repository is really just a pointer to the data that
was present when the snapshot was made. Removing a snapshot does not remove the
data from the repository, only when the command prune
is run, unreferenced
(and therefore unneeded) data is removed:
$ restic prune
counting files in repo
building new index for repo
[0:00] 100.00% 1 / 1 files
repository contains 1 packs (8 blobs) with 3.003 MiB bytes
processed 8 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 234 snapshots
[0:00] 100.00% 234 / 234 snapshots
found 8 of 8 data blobs still in use
will rewrite 0 packs
creating new index
[0:00] 100.00% 2 / 2 files
saved new index as 14a7838d
done
In this example prune
was finished quickly, but it can take a longer time to
check the references for each blob of data. Restic combines several blobs of
data into so-called “pack” files. When a pack file is found to contain some
data that is still referenced and other data that isn’t needed any more, it
will create a new pack file and write the needed data to it, then remove the
original pack file. This process can also take some time.
Applying an expire policy
Removing a single snapshot is useful, but not very convenient. Let’s check out
the specific parameters of the forget
command:
$ restic forget --help
[...]
Help Options:
-h, --help Show this help message
[forget command options]
-l, --keep-last= keep the last n snapshots
-H, --keep-hourly= keep the last n hourly snapshots
-d, --keep-daily= keep the last n daily snapshots
-w, --keep-weekly= keep the last n weekly snapshots
-m, --keep-monthly= keep the last n monthly snapshots
-y, --keep-yearly= keep the last n yearly snapshots
--hostname= only forget snapshots for the given hostname
-n, --dry-run do not delete anything, just print what would be done
The most important parameter is --dry-run
, which will only print the
snapshots that would be removed according to the policy set by the other
parameters.
The basic idea is that you run forget
by specifying the right parameters tell
restic which snapshots you want to keep. Restic then goes through the list of
snapshots and removes those that do not match the policy.
Let’s try this with a simple policy: Restic should keep the last seven daily snapshots, eight weekly backups and only a monthly backup for 24 months:
$ restic forget --dry-run --keep-daily 7 --keep-weekly 8 --keep-monthly 24
keep 21 snapshots:
ID Date Host Directory
----------------------------------------------------------------------
5116cfdc 2016-08-22 05:00:00 mopped /home/fd0/tmp/data
6261b96f 2016-08-21 05:00:00 mopped /home/fd0/tmp/data
d0267dbb 2016-08-20 05:00:00 mopped /home/fd0/tmp/data
e7e18480 2016-08-19 05:00:00 mopped /home/fd0/tmp/data
b2fd97b2 2016-08-18 05:00:00 mopped /home/fd0/tmp/data
9743b40d 2016-08-17 05:00:00 mopped /home/fd0/tmp/data
3ef3007b 2016-08-16 05:00:00 mopped /home/fd0/tmp/data
3c3f7ad4 2016-08-15 05:00:00 mopped /home/fd0/tmp/data
b471d6eb 2016-08-14 05:00:00 mopped /home/fd0/tmp/data
0f2f3b55 2016-08-07 05:00:00 mopped /home/fd0/tmp/data
47fe0a0f 2016-07-31 05:00:00 mopped /home/fd0/tmp/data
0d7b57eb 2016-07-24 05:00:00 mopped /home/fd0/tmp/data
c94ee5ac 2016-07-17 05:00:00 mopped /home/fd0/tmp/data
fc48f6b6 2016-07-10 05:00:00 mopped /home/fd0/tmp/data
5e9fe6d2 2016-07-03 05:00:00 mopped /home/fd0/tmp/data
774c5721 2016-06-26 05:00:00 mopped /home/fd0/tmp/data
d9b9c5b2 2016-05-31 05:00:00 mopped /home/fd0/tmp/data
446e6030 2016-04-30 05:00:00 mopped /home/fd0/tmp/data
6f86935a 2016-03-31 05:00:00 mopped /home/fd0/tmp/data
1722682f 2016-02-29 04:00:00 mopped /home/fd0/tmp/data
dd2bbbf9 2016-01-31 04:00:00 mopped /home/fd0/tmp/data
remove 213 snapshots:
ID Date Host Directory
----------------------------------------------------------------------
f3da855f 2016-08-13 05:00:00 mopped /home/fd0/tmp/data
347274f4 2016-08-12 05:00:00 mopped /home/fd0/tmp/data
f314dd1f 2016-08-11 05:00:00 mopped /home/fd0/tmp/data
[...]
a07ffe20 2016-01-04 04:00:00 mopped /home/fd0/tmp/data
ee891602 2016-01-03 04:00:00 mopped /home/fd0/tmp/data
62a86121 2016-01-02 04:00:00 mopped /home/fd0/tmp/data
You can see that when this command is run without --dry-run
, restic will
remove a lot of snapshots (213 of 235):
$ restic forget --keep-daily 7 --keep-weekly 8 --keep-monthly 24
snapshots for host mopped, directories /home/fd0/tmp/data:
keep 21 snapshots:
ID Date Host Directory
----------------------------------------------------------------------
5116cfdc 2016-08-22 05:00:00 mopped /home/fd0/tmp/data
6261b96f 2016-08-21 05:00:00 mopped /home/fd0/tmp/data
d0267dbb 2016-08-20 05:00:00 mopped /home/fd0/tmp/data
[...]
remove 213 snapshots:
[...]
$ restic prune
counting files in repo
building new index for repo
[0:00] 100.00% 1 / 1 files
repository contains 1 packs (8 blobs) with 3.003 MiB bytes
processed 8 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 21 snapshots
[0:00] 100.00% 21 / 21 snapshots
found 8 of 8 data blobs still in use
will rewrite 0 packs
creating new index
[0:00] 50.00% 1 / 2 files
saved new index as 504caa39
done
Afterwards, the list of snapshots is a lot shorter:
$ restic snapshots
ID Date Host Directory
----------------------------------------------------------------------
dd2bbbf9 2016-01-31 04:00:00 mopped /home/fd0/tmp/data
1722682f 2016-02-29 04:00:00 mopped /home/fd0/tmp/data
6f86935a 2016-03-31 05:00:00 mopped /home/fd0/tmp/data
446e6030 2016-04-30 05:00:00 mopped /home/fd0/tmp/data
d9b9c5b2 2016-05-31 05:00:00 mopped /home/fd0/tmp/data
774c5721 2016-06-26 05:00:00 mopped /home/fd0/tmp/data
5e9fe6d2 2016-07-03 05:00:00 mopped /home/fd0/tmp/data
fc48f6b6 2016-07-10 05:00:00 mopped /home/fd0/tmp/data
c94ee5ac 2016-07-17 05:00:00 mopped /home/fd0/tmp/data
0d7b57eb 2016-07-24 05:00:00 mopped /home/fd0/tmp/data
47fe0a0f 2016-07-31 05:00:00 mopped /home/fd0/tmp/data
0f2f3b55 2016-08-07 05:00:00 mopped /home/fd0/tmp/data
b471d6eb 2016-08-14 05:00:00 mopped /home/fd0/tmp/data
3c3f7ad4 2016-08-15 05:00:00 mopped /home/fd0/tmp/data
3ef3007b 2016-08-16 05:00:00 mopped /home/fd0/tmp/data
9743b40d 2016-08-17 05:00:00 mopped /home/fd0/tmp/data
b2fd97b2 2016-08-18 05:00:00 mopped /home/fd0/tmp/data
e7e18480 2016-08-19 05:00:00 mopped /home/fd0/tmp/data
d0267dbb 2016-08-20 05:00:00 mopped /home/fd0/tmp/data
6261b96f 2016-08-21 05:00:00 mopped /home/fd0/tmp/data
5116cfdc 2016-08-22 05:00:00 mopped /home/fd0/tmp/data
How does restic find the snapshots to remove?
It is important to know how forget
filters the list of snapshots, so we’ll go
through this in detail now. First, restic lists all snapshots and splits the
list into separate lists, one for each combination of host name and directories
that have been saved. In our example above, just one host name (mopped
) and
directory (/home/fd0/tmp/data
) was saved, so that makes just one list to go
through.
Restic will then sort the list from newest to oldest snapshot and does the following, in exactly this order:
When --keep-last
is set, e.g. to the value 10, the newest ten snapshots are
kept and removed from the list.
When --keep-hourly
is set, e.g. to the value 4, then restic will find the
four most recent hours in which a snapshot was created. For each of those
hours, it marks the last snapshot as to be kept, and flags the others for
removal. It will then remove all the snapshots for these hours from the list.
It’s easier than it sounds. Consider the following snapshots in a repo:
$ restic snapshots
ID Date Host Directory
----------------------------------------------------------------------
dbd30e0e 2016-08-22 03:00:00 mopped /home/fd0/tmp/data
45e789ca 2016-08-22 03:53:08 mopped /home/fd0/tmp/data
c0411b71 2016-08-22 04:00:00 mopped /home/fd0/tmp/data
1f782cb4 2016-08-22 04:13:23 mopped /home/fd0/tmp/data
62df5e1e 2016-08-22 04:18:23 mopped /home/fd0/tmp/data
0b9fe168 2016-08-22 05:23:00 mopped /home/fd0/tmp/data
0fe0dcfe 2016-08-22 18:08:17 mopped /home/fd0/tmp/data
d221a465 2016-08-22 19:24:00 mopped /home/fd0/tmp/data
98fb9f00 2016-08-22 19:53:23 mopped /home/fd0/tmp/data
Running forget --keep-hourly 4
, restic will find the two snapshots at
19:24:00 and 19:53:23. This is one hour (starting at 19:00:00 and ending at
19:59:59) and restic will only keep the last snapshot for this hour. This
means that 98fb9f00
is kept, and d221a465
is removed. The next hour that
has a snapshot starts at 18:00:00, the one after that at 05:00:00, and so on.
This is the result of running forget --keep-hourly 4
:
$ restic forget --dry-run --keep-hourly 4
keep 21 snapshots:
ID Date Host Directory
----------------------------------------------------------------------
62df5e1e 2016-08-22 04:18:23 mopped /home/fd0/tmp/data
0b9fe168 2016-08-22 05:23:00 mopped /home/fd0/tmp/data
0fe0dcfe 2016-08-22 18:08:17 mopped /home/fd0/tmp/data
98fb9f00 2016-08-22 19:53:23 mopped /home/fd0/tmp/data
remove 213 snapshots:
ID Date Host Directory
----------------------------------------------------------------------
dbd30e0e 2016-08-22 03:00:00 mopped /home/fd0/tmp/data
45e789ca 2016-08-22 03:53:08 mopped /home/fd0/tmp/data
c0411b71 2016-08-22 04:00:00 mopped /home/fd0/tmp/data
1f782cb4 2016-08-22 04:13:23 mopped /home/fd0/tmp/data
d221a465 2016-08-22 19:24:00 mopped /home/fd0/tmp/data
When --keep-daily
is set, e.g. to the value 7, then restic will apply a similar
approach to --keep-hourly
: Go through the list, find the last seven days in
which at least one snapshot was made. For each day, keep the last snapshot made
on that day, flag the others for removal, and delete all snapshots from the
list.
The options --keep-weekly
, --keep-monthly
and --keep-yearly
are applied
in the same way.
Conclusion
This article described an easy way to remove a single snapshot and also explained how to apply an expire policy for snapshots. This allows regularly removing snapshots from the repository to limit its growth.
The functions to remove snapshots and unneeded data from the repository are new. Please report an issue if you notice any odd behavior or find bugs.