Tarsnap - Online backups for the truly paranoid

Navigation menu

Frequently Asked Questions about Tarsnap

Questions are split into four sections: getting started, legal/accounting/administrative, technical details, and often requested features. If you have a question that is not answered here, please contact us.

Getting started

Legal / accounting / administrative

Technical details

Requested features

Getting started

I tried to register but I never received the confirmation email. What's wrong?

It's probably stuck in a spam filter somewhere. Tarsnap sends email via Amazon SES, which is usually successful in delivering email (more so than when Tarsnap mail was sent directly, at least); but convincing everyone to accept email from you is incredibly difficult these days.

If you can't convince your spam filters to let Tarsnap email through, try registering from an email address at a different domain.

I've been using Tarsnap for a few hours, but I can't see any usage shown on the web interface. What's going on?

All Tarsnap accounting is currently done daily at approximately midnight UTC. Wait until after that point and your usage should be visible. (Payments made should show up immediately, however).

Does Tarsnap run on Windows?

Only via Cygwin.

What happens when my account runs out of money?

You will be sent an email when your account balance falls below 7 days worth of storage costs warning you that you should probably add more money to your account soon. If your account balance falls below zero, you will lose access to Tarsnap, an email will be sent to inform you of this, and a 7 day countdown will start; if your account balance is still below zero after 7 days, it may be deleted (along with any data you have stored) at our discretion. (If you can't add money yet but will be able to later, contact us and explain the situation. We're reasonable people and simply knowing that you're alive and haven't forgotten that you were using Tarsnap is very helpful.)

Why did my daily storage usage cost change when I haven't uploaded or deleted any data?

Tarsnap's storage cost is priced per month, and different months have different numbers of days. In addition, although the pricing is defined in picodollars ($10-12), Tarsnap computes the cost per byte-day of storage for each month in attodollars ($10-18).

Do you accept bitcoins as payment?

Yes, starting in late March 2014 you can fund your Tarsnap account using bitcoins, which will be converted to US dollars at the current exchange rate.

I've forgotten my tarsnap account password; how can I reset it?

Please contact the author.

I've lost the key file for a machine; how can I delete its data so that I'm not stuck paying for it forever?

Please contact the author.

I've lost the key file for a machine (or I have the key file, but I've forgotten the passphrase I set on it); how can I read my archives?

You can't. Your key file contains the only copy of the cryptographic keys needed to decrypt your data; if you lose them there is no way to get your data back.

Technical details

If Tarsnap costs $0.25 / GB of storage, how is it possible to store "archives adding up to several terabytes" while paying less than $10/month?

Please see our page about deduplication efficiency.

Since the cost depends on "encoded bytes", how can I predict how much Tarsnap will cost before signing up?

Starting with tarsnap 1.0.36, you can test the deduplication and compression without an account:

tarsnap --dry-run --no-default-config --print-stats --humanize-numbers -c /MY/DATADIR

This will produce output in the form:

tarsnap: Performing dry-run archival without keys
         (sizes may be slightly inaccurate)
tarsnap: Removing leading '/' from member names
                                       Total size  Compressed size
All archives                               2.2 GB           1.8 GB
  (unique data)                            2.1 GB           1.7 GB
This archive                               2.2 GB           1.8 GB
New data                                   2.1 GB           1.7 GB

The value which matters for the cost is "(unique data) -- Compressed size", which represents the "encoded bytes" that is stored on the Tarsnap servers. In above example, this is 1.7 GB, so it will cost approximately $0.43 (= $1.7 * 0.25) to upload the data, and $0.43 per month for storage.

Note that deduplication is most effective when creating multiple snapshots (e.g., daily backups), so it will not help much for the initial snapshot. We have a few examples of deduplication with multiple snapshots.

Is Tarsnap storage reliable?

Yes. Data archived via Tarsnap is stored on the Amazon S3 storage service (the original version, not the "reduced redundancy" version introduced in 2010).

Why doesn't Tarsnap --list-archives print archives in alphabetical (or chronological) order?

The archive metadata which contains Tarsnap archive names and creation times is encrypted; so it's impossible for the Tarsnap client code to figure out in what order the archives should be listed until it downloads and decrypts the metadata. Once it has done so, it might as well just print out the information immediately — if you want a particular order, sort(1) is your friend.

Can I move my Tarsnap setup to a new computer?

Yes, no problem! Tarsnap doesn't care about the physical hardware; only the data, keyfile, and cache directory that it is given to work with.

To confirm that everything is set up, we recommend that after you have copied your data to the new system, you create an archive on the old system with --print-stats, transfer the cache directory to the new system, then run Tarsnap again with the --dry-run --print-stats options. The "new data" size should be quite small (consisting of archive metadata), and the "this archive" size should be approximately the same as the old statistics (machines can present metadata in a slightly different manner, and can list files within a directory in a new order which could alter the compression efficiency).

How can I investigate network problems?

We have a series of tips about debugging Tarsnap network problems.

Requested features

Is there an option to avoid storing data on US servers?

Due to concerns about the privacy of personal data and industrial espionage, some organizations would prefer (or even mandate) that their data not be stored on servers which reside in the United States of America.

From a purely technical standpoint, there is no benefit to this. Tarsnap encrypts all data before it leaves a computer. It therefore does not matter where that data is stored; attackers (be they criminals or governments) cannot decrypt the data. However, we realize that while the technical staff in an organization may understand our encryption, policy-makers may still be hesitant, especially if they face personal liability if their customers' data is stolen. Adding this feature could also make some security checklists and paperwork easier to fill out; for example "is any data stored outside of the EU?".

We are hoping to add the option to avoid US-based servers, but any change to our infrastructure and customer data must handled extremely carefully. At the moment we are not announcing any estimated time of completion for this feature.

Is there an option to print which files have been modified since the previous backup?

Not directly. Because Tarsnap's deduplication happens after files have been squished together into a tar stream and that tar stream has been split into blocks, it is not feasible to track backwards to figure out which file a particular new block came from. For that matter, you can get blocks which contain pieces from several different files, or blocks of data could appear in multiple files.

A more technical answer is that the deduplication is done at a different layer from the crawl-a-directory-tree-and-generate-a-tarball code. In essence the layers are:

  • bsdtar code, which crawls a directory tree and feeds files to the
  • libarchive code, which generates a stream of tar and feeds it to the
  • multitape code, which splits the stream into several sub-streams and uses the
  • chunkifier code, which splits each sub-stream into chunks and sends them to the
  • chunk deduplication code, which looks at each chunk to decide if it's new.

(And underneath this all is the transactional storage layer, the request protocol layer, the network connection protocol layer, and the underlying non-blocking network I/O code.)

Feeding information back from the chunk deduplication code up to the bsdtar code is theoretically possible, but the necessary code would be complex and would risk introducing bugs.

One trick for tracking the changed files (other than the obvious "find . -mtime -1d", assuming daily backups) is to run tarsnap with a small value for --maxbw-rate (e.g., --maxbw-rate 50000) and then send it a SIGUSR1 every second to check which file Tarsnap is processing. This will prompt tarsnap to repeatedly print its current progress, and when it slows down dramatically you've found a place where it is finding lots of new data which it needs to upload.

Why doesn't Tarsnap use AWS Glacier?

Amazon Glacier is a cloud storage service aimed at backing up large amounts of data where it may be accessed slowly and infrequently. However, we cannot store some files in this "cold storage" and other files in the regular Amazon S3 service while retaining Tarsnap's deduplication abilities. It would theoretically be possible to mark all the files stored with a particular key as being frozen ("glaciated"?), but the implementation would require reworking a great deal of the Tarsnap server code. We would like to support this, but at the moment we are not announcing any estimated time of completion for this feature.

For more details, see why Tarsnap doesn't use Glacier.