Tarsnap - Online backups for the truly paranoid

Navigation menu

Available tips

Including and excluding files

You may select which files to be included in backups, either on a global basis or a per-directory / per-file basis.

What to back up

Tarsnap can be used to back up your entire system by pointing it to /, but this can lead to unnecessarily large backups. We can't tell you exactly what to back up, but we can give you some questions and advice to consider.

The fundamental question is: "if I lost this data, how much effort would it take to recover it?"

Operating system files

  • If you are running a standard operating system like Ubuntu x.y or FreeBSD z.w, then it might not be necessary to back up those files — they are easy to reinstall.
  • On the other hand, if you are running a heavily-tweaked operating system with a great deal of manually-installed software, then perhaps it would be a good idea to back up everything — although this will incur a large initial archive, future archives will be much smaller thanks to deduplication.

Temporary files

  • By their very definition, temporary files are not worth backing up. However, simply excluding *tmp* may cause problems; for example, it would affect tmpfile_handler.c and webapp_add_user.tmpl. We therefore recommend:
    exclude */tmp/*
  • There are a few other common patterns for files and directories that are likely not worth backing up: memory dumps, the OSX user cache directory, and the GNOME virtual filesystem:
    exclude *.core
    exclude /Users/*/Library/Cache
    exclude /home/*/.gvfs/

Don't use your own compression

Don't apply any compression (gzip, bzip2, zip, tar.gz, etc.) to your data — Tarsnap itself will compress data after it performs deduplication. If you compress a file, it will interfere with deduplication, meaning that if you change that file and re-compress it, Tarsnap will upload (almost) all of that compressed file again. It is more efficient (and saves you money!) to let Tarsnap handle the deduplication and compression.

GNU gzip has a --rsyncable option which attempts to compress a file while retaining some amount of deduplication ability. This is not as efficient as using Tarsnap's normal deduplication and compression, but if you must use your own compression for files which are likely to be modified, using --rsyncable could lessen the increase in storage space that you would otherwise incur.

How to back up "live" files and filesystems

For normal home use, Tarsnap needs no special handling. But when used on a production server, there are a few additional considerations.

Backing up a database

  • Don't attempt to back up a "live" database. Use your database software to create a static dump (ideally a text file).
  • Don't compress the database dump. Tarsnap's deduplication (followed by its compression) is more efficient than attempting to compress the database dump directly.

Backing up a live filesystem snapshot

A filesystem snapshot is a copy of a filesystem at a particular time. This allows an "atomic" archive to be created, even if the filesystem modifies files while Tarsnap is running. For an example of the problems which can arise from "non-atomic" archives, consider the following directory layout:

a/
b/lots-of-data.dat
c/important-file.txt

Suppose that Tarsnap processed the directories in the order a, b, c; however, while Tarsnap was archiving b, the user moved c/important-file.txt into a. When Tarsnap read c, the important-file.txt was not in that directory, so important-file.txt was not archived at all! A similar scenario (in which Tarsnap processed the directories in reverse order) could result in important-file.txt being archived in both directories. These concerns can apply to a single file being modified in place — we could end up with half of an old version of the file, and half of the new version.

In order to avoid these problems, filesystem snapshots were developed in the 1990s. Given a normal read-write (RW) filesystem, a filesystem (FS) snapshot is created as a read-only (RO) filesystem, then deleted once the archive is created:

filesystem (RW) --+--> use as normal
                  |
           (create FS snapshot)
                  |
                  +--> FS snapshot (RO) --> run tarsnap --> delete FS snapshot

Please consult the documentation for your operating system or filesystem management software to learn how to create and delete filesystem snapshots.

There is a potential race condition when backing up from a filesystem snapshot which could result in Tarsnap being unable to detect some modifications to a file. If you would like to run Tarsnap on filesystem snapshots, please read about the --snaptime argument.

You can print statistics about all archives with:

tarsnap --print-stats -f '*'

Copying an archive

If you wish to create an identical copy of an archive (for example, having identical archives named backup-2016-01-01-daily, backup-2016-01-01-weekly, backup-2016-01-01-monthly), we recommend that you use:

tarsnap -c -f backup-2016-01-01-weekly @@backup-2016-01-01-daily

This will download some metadata (the list of blocks in backup-2016-01-01-daily), but this is a relatively small amount of bandwidth.

Alternatively, you could simply create a second archive right after creating the first one; our deduplication algorithm will ensure that no data will be uploaded unnecessarily. This avoids downloading metadata, but if your files have changed between creating the first and second (or subsequent) archives, then your "copy" would not be an exact copy.

Write-only keys

tarsnap-keymgmt creates new keys with (optionally) reduced permissions. In particular, it can create "write-only" keys:

tarsnap-keymgmt --outkeyfile write-only-key.txt -w ~/tarsnap-main-key-file.txt

This allows you to create a write-only key (without a passphrase) which is used to create archives automatically, and a key with more permissions (requiring a passphrase) which is used for restoring or deleting archives. In this system, if an intruder breaks into your server, she would be able to halt your backups (or add new archives with faulty data), but not delete your existing archives.

If you want to keep your full keys on a different (more secure) system and only use them there, or use keys on multiple systems for any other reason, things get more complicated.

Monitoring stats without access to root's cachedir

Many users run tarsnap as the root user, but this means that they cannot monitor their usage with an unprivileged user account. There are two ways to allow user username to view the statistics:

  • As root, configure the system with:
    touch /var/log/tarsnap-output.log
    chown username /var/log/tarsnap-output.log
    Then redirect the statistics to a file when creating an archive:
    tarsnap --print-stats -c OPTIONS >/var/log/tarsnap-output.log
  • If you use sudo, you can allow an unprivileged user account to run a specific command as root by adding this to your sudoers file:
    username    ALL = (root) NOPASSWD: /usr/local/bin/tarsnap --print-stats
    (adjust the directory as appropriate)

Checking which file Tarsnap is processing

The tarsnap binary responds to the SIGUSR1 POSIX signal (and SIGINFO on platforms which support it) by printing the current file. For example, running this command in one terminal:

tarsnap --dry-run -c ~/src/tarsnap

Then entering this command in a different terminal:

killall -SIGUSR1 tarsnap

Will produce output similar to this in the first terminal:

adding home/td/src/tarsnap/build/tarsnap-keygen (196608 / 637689 bytes)

On BSD systems (including OS X), a SIGINFO can be sent to the active terminal by pressing ^T (CTRL-T).

Cleanly stopping a Tarsnap upload

If you use ^C (CTRL-C) to stop Tarsnap uploading a new archive, you will lose progress back to the last checkpoint. To stop cleanly, use ^Q (CTRL-Q) or send the SIGQUIT signal to tell Tarsnap to create a truncated archive. The truncated archive will have ".part" appended to its name.

Receiving emails from making backups

If you are running the /root/tarsnap-backup.sh backup script described in Simple usage, then you may wish to modify it to send you an email with the status, particularly if you are running it automatically with cron:

#!/bin/sh

# User variables
email=my_email@example.net
tarsnap_output_filename=/tmp/tarsnap-output-temporary.log

# Run backup
tarsnap -c \
	-f "$(uname -n)-$(date +%Y-%m-%d_%H-%M-%S)" \
	/MY/DATADIR >$tarsnap_output_filename 2>&1

# Send email
if [ $? -eq 0 ]; then
	subject="Tarsnap backup success"
else
	subject="Tarsnap backup FAILURE"
fi
mail -s "$subject" $email < $tarsnap_output_filename
rm $tarsnap_output_filename

Naturally, you will want to modify the email variable and the /MY/DATADIR directory name(s).

Setting up command-line mail

Sending email with mail will only function if you have configured a Mail Transfer Agent (MTA) such as sendmail, postfix, or ssmtp. If your system does not have an MTA already set up, then we recommend trying ssmtp (also known as sSMTP), which is an extremely simple MTA and thus is easier to configure.

More information

There are many other options available with tarsnap. All of the information on this page, and more, can be found in the man pages.