Unexpectedly popular: svnbackup

My svnbackup script is the second most popular page on my site, after the PyMOTW home page, and search terms such as “svn backup” and “svn backup script” regularly appear at the top of the list of sources of traffic to my site. The link to svnbackup doesn’t appear on the first page of Google’s search results, a sign I take to mean that this problem isn’t well understood or solved (otherwise, why would so many people page through the search results to find it?).

Requirements

I created svnbackup to manage off-site backups of my svn repository and the repository we run at work. The requirements were pretty basic:

  • Incremental backups.
  • Easy to restore.
  • Safe if the backups were running during a transaction.

Both repositories use FSFS to store the repository contents, so according to some reports it would be safe to simply rsync (or otherwise backup) the raw repository data as long as (or possible even if) a transaction was in process. It turns out to not be so difficult to do a safer backup, though, so that’s what I went with.

Solution

The obvious solution is to use “svnadmin dump”, to extract transaction information from the repository. svnbackup.sh is a wrapper around svnadmin to produce reasonably-sized chunks for the backups. The only real problems I had to solve were how to track what had been backed up and how to move the backup output off-site.

Tracking the last revision number which had been backed up is easy using a simple text file on the svn server. If the information is lost, the worst thing that happens is the next run backs up the entire repository again. That can be time consuming, but is not destructive. Copying the files off-site is handled via scp.

Alternatives

Other alternatives have more or different options. I like python, and obviously use it a lot, but I’m not sure I would have used it as a shell script replacement as the folks at collab.net did. On the other hand, I didn’t care that my tool doesn’t run on Windows (thought it might, with Cygwin) and they do. If I had found their tool when I needed it, I probably would not have written my own, since the features are largely the same. Their off-site support uses ftp, not scp, but it looks like it would be straightforward to add the scp support.