Section 10.2. Backing Up the Repository | Subversion Version Control. Using The Subversion Version Control System in Development Projects

10.2. Backing Up the Repository

I could tell you a story about a friend of mine losing vital data to a hard-drive crash, at the most inconvenient of times. I could tell you the story of a colleague losing weeks' worth of work because someone erased the wrong partition. Or, I could even tell you a story about the time I lost a college term paper because lightning struck my dorm. I won't, though, because you undoubtedly have your own stories of data loss and no desire to hear about someone else's misfortune. Because of your own personal experience, you almost certainly have no need for a lecture on the importance of regular backups. So, I won't give you one. Instead, I'll just show you how you can ensure that you don't lose your repository at a most inopportune moment.

10.2.1. Hotcopying the Repository

The best way to back up an entire Subversion repository is through use of the svnadmin hotcopy command. The hotcopy command ensures that the repository gets copied in a manner that is safe, even if other users are accessing the repository during the copy. If you are using a Berkeley DB (BDB) database backend, the svnadmin hotcopy command is critical (unless you can be absolutely certain the repository will not be accessed during the copy). If you just use a standard filesystem copy of the repository, even a simple read access could cause corruption in a BDB repository. If you are using a FSFS (filesystem-based backend) repository instead, the consequences of copying without using the svnadmin hotcopy command are not as dire, but you could have a bad revision file if the copy occurs in the middle of a commit. The safest choice in most instances is to always use svnadmin hotcopy to copy your repository.

Performing a hotcopy with svnadmin hotcopy is easy. All you need to do is run the command with the name of the repository and the final copy.

 $ svnadmin hotcopy /srv/myrepos /mnt/backup/myrepos.backup

If you want Subversion to clean out unnecessary log files when it makes the copy, you can pass the --clean-logs option.

The Subversion source also includes a convenience Python script that you can use to perform hotcopies, named hot-backup.py. The hot-backup.py script takes the repository to back up, and a directory where the backup should be created. It then creates a new hotcopy named after the copied repository with the HEAD revision of that repository appended, so that multiple backups won't overwrite each other.

 $ hot-backup.py /srv/myrepos /mnt/backup/ Beginning hot backup of '/srv/myrepos'. Youngest revision is 265 Backing up repository to '/mnt/backup/myrepos-265' ... Done. $ ls /mnt/backup/ myrepos-10 myrepos-146 myrepos-265

10.2.2. Dumping the Repository

Another way to back up a Subversion repository is through use of the svnadmin dump command, which dumps the contents of a repository into a text file that can later be used to populate another repository. Dumping the repository is not nearly as efficient as performing a hotcopy, but it does have its advantages. For instance, repositories can be dumped incrementally, so each dump doesn't need to contain the entire repository.

To perform a dump of your repository, say that you have a repository located at /srv/ svnrepos that you want to back up. If you run the dump command with no options, except the name of the repository to dump, it will dump the entire contents of every revision of that repository to standard out (i.e., your console screen).

 $ svnadmin dump /srv/svnrepos --- Snipped Massive Amounts of Output ---

That's probably not what you want (unless you can read and memorize really fast). Instead, you want to add one more thing and redirect the output into a file, as with the following example. That way, the only thing output to the console will be the revisions that it has processed, as it progresses (which are sent to standard error, not standard out).

 $ svnadmin dump /srv/svnrepos > svnrepos-091504.dump * Dumped revision 0. * Dumped revision 1. * Dumped revision 2. --- Snipped Output --- * Dumped revision 1487. * Dumped revision 1488.

Daily backups of your whole repository can get to be pretty big, pretty fast. Given that your old revisions will never changeunless you mess with revision propertiesthat may mean a lot of data is getting backed up with excessive redundancy. And even though storage space is cheap, that may mean a lot of wasted time and space, which does eventually add up. Subversion does have a solution, though, in the form of incremental dumps.

If you dump a repository with the --incremental option, and a range of revisions, it will only dump those revisions, such that multiple incremental backups can later be run consecutively to perform a full restore on the repository. In other words, if I dump the first 50 revisions of a repository and then later dump the next 75 into a different file, I can completely restore the first 125 by loading the first dump followed by the second dump.

 $ svnadmin dump -r 3:5 --incremental /srv/svnrepos > svnrepos-r3-r5.dump * Dumped revision 3. * Dumped revision 4. * Dumped revision 5.

As I mentioned, though, there is a downside to doing all of your backups incrementally, because changes to revision properties in previously archived revisions won't be backed up. There are a number of ways that you can work around this issue, and still make use of incremental backups.

Don't do anything. By default, revision properties are immutable. If you you never allow revision properties to be changed, you never have to worry about a revision property change being lost at a later date. If you do allow some revision properties to be changed by allowing them in a hook script, you can add specific logic to your pre-commit script that will only allow unarchived revision properties to be modified (see Chapter 11 for more information).
This has the advantage of being the easiest solution to deal with, but it means you lose the ability to make backdated changes to things like log files if an error is found. If you use revision properties to store custom data that makes past revision properties volatile, this may also be impractical.
Explicitly rearchive changed revisions. If a developer makes a change to a revision property in a revision that has been previously archived, make it that developer's responsibility to inform an administrator and trigger a re-archiving of that block of incremental revisions. For example, if the archive on June 15th contained revisions 38 through 75, and you make a change to the log for revision 46, you would then want to recreate the June 15th archive for revisions 38 through 75 and replace the old archive file.
If changes to archived revision properties are rare, this may be the best solution. It allows you to take advantage of incremental dumps to save time and space, while providing a procedure for modifying archived revisions if necessary. The downside, though, is that it is the developer's responsibility to make sure everything stays in sync. If the developer forgets to note the change, it could easily cause problems in the future, long after everyone involved has forgotten what change was made.
Make periodic full backups, in addition to more frequent incremental dumps. For instance, you might make incremental backups every night, but make a full repository backup every week or month. This reduces the chance of losing a change to archived revision properties, while still reducing your backup load significantly. Of course, you still run the risk of losing a change if a crash occurs before the next full backup, so you should weigh the risk carefully before using it.

10.2.3. Automating Your Backups

There is an innumerable variety of automated backup tools available to the discerning systems administrator, many very expensive, complex, or feature rich (some are even all three). For the administrator of a small and/or open source project, though, these heavyweight backup tools are often overkill. Therefore, in this section, I will show you how to do simple automated incremental backups of your Subversion repository, using just cron, and the Subversion dump command on a UNIX-like system. If you are using Windows instead, there are similar options available to you.

An Incremental Dumping Script

The first thing you need is a script that will create an incremental repository dump, starting with where your last incremental backup left off. Creating the incremental dump is easy, but retaining state from one dump to the next can be a little trickier. The following example script shows the solution that I used for my own company's Subversion repository backups. First, it mounts the backup server (via Samba) and performs a hotcopy of the repository. Then, it performs a dump of the revisions that have been committed since the last backup and sends the dump file to an offsite backup server.

 #!/bin/sh # Makes a backup of a subversion repository # Usage: backup_subversion.sh REPOS SAMBASHARE="//backupsvr/subversion" SAMBAPASSWD="backupPasswd" MOUNTPOINT="/mnt/backup_subversion" REPOSBASE="/srv/repositories" REPOS="${1}" OFFSITE="backupusr@offsitebackup.example.org" # Mount the samba shared backup server /bin/mount -t smbfs -o password=${SAMBAPASSWD} "${SAMBASHARE}" "${MOUNTPOINT}" # Remove the old "yesterday" backup (from two days ago) /bin/rm -rf "${MOUNTPOINT}/${REPOS}.yesterday" # Rename yesterday's backup /bin/mv "${MOUNTPOINT}/${REPOS}" "${MOUNTPOINT}/${REPOS}.yesterday" # Perform a hotcopy of the repository /usr/bin/svnadmin hotcopy "${REPOSBASE}/${REPOS}" "${MOUNTPOINT}/${REPOS}" # Unmount the samba share /bin/umount "${MOUNTPOINT}" # Get (and save) some information about the revisions for #  the incremental backup /usr/bin/svnlook youngest "${REPOSBASE}/${REPOS}" > "${REPOSBASE}/${REPOS}/end.rev" BEGIN=` cat "${REPOSBASE}/${REPOS}/begin.rev"` END=` cat "${REPOSBASE}/${REPOS}/end.rev"` # If no new revisions have been created, there's nothing to send offsite if [ $BEGIN == $END ]; then exit 0; fi # Make the incremental dump of the changes made to the #  repository since the last backup. /usr/bin/svnadmin dump --incremental -r ${BEGIN}:${END} \         "${REPOSBASE}/${REPOS}" > "/tmp/${REPOS}-${BEGIN}-${END}.dump" # If the dump was successful, use SCP to send the dumpfile to the offsite #  backup server if [ $? == 0 ] then    /usr/bin/scp "/tmp/${REPOS}-${BEGIN}-${END}.dump" ${OFFSITE}:~    echo $((${END} + 1)) > "${REPOSBASE}/${REPOS}/begin.rev" fi

Setting Up Cron

Now that you have your script for doing automatic incremental dumps, you need to set up the backups to happen automatically, using cron.

Run crontab for the user that owns the Subversion repository, with the -e option to indicate that you want to edit the file.
```
 $ crontab -u svnuser -e 
```
Add a line to your crontab that will run the incremental backup every night (in this case, at 3:00 AM).
```
 0 3 * * * /srv/svnrepos/backup.sh /srv/svnrepos 
```
After you have your automated backup script set to run, you can send it to a longterm archive using whatever means best fits your server setup. For instance, you might copy it to a shared network drive on a backup server or archive it to a tape drive or CD-ROM.

10.2.4. Recovering

Near tragedy! Your server failed and you lost the entire Subversion repository! Fortunately, you've made regular backups, and can restore everything to the way it was last night at 4:00 AM (you have been making backups, right?).

If you are recovering from a backup made using svnadmin hotcopy, all you need to do to restore the backup is to copy the backup version back to your Subversion server and make sure all of the permissions/connection settings are set up properly.

If you are restoring from incremental dumps, the process is a little more involved, but still reasonably easy.

Get your server back online, with all of the necessary Subversion software set up and restored to the correct state (if you've backed up your configuration settings, this should be an easy step).
Create a new empty repository for storing the data, with svnadmin create.
```
 $ svnadmin create /srv/svnrepos 
```
Start with your first incremental repository dump file (or your only one, if you haven't been using incremental backups) and load it back into the repository, using the svnadmin load command, with the newly created repository as an argument, and the contents of the dump file fed to it via stdin.
```
 $ cat svnrepos.dump | svnadmin load /srv/svnrepos 
```
If you have more repository dump files, restore them by repeating step 3 for each of the dump files, in the correct order (it's very important that you load them in order, from oldest to newest).

That's it. Your restored repository should be back up and running.