5.2. How BackupPC Works The BackupPC model has one user per client. This fits the usage pattern of the type of environment it was specifically designed for: backing up several users' PCs (hence the name). This should typically be the user who owns the data on the machine. In the case of a large file server, it should be an administrator. BackupPC emails the owner if it cannot back up the client after a configurable time, and the owner can control restores using the web interface. The following list describes how BackupPC works:
Direct to disk -
BackupPC stores all its backups directly on disk. Identical files across any directory or client are stored only once, which dramatically reduces the server storage requirements. These files are stored in a disk pool. In addition to the disk pool, the backups are in a directory tree organized by host, then by backup with hard links to the disk pool. BackupPC also has a nightly process to reclaim space from the disk pool that is no longer referenced by any backups, which helps keep the overall disk usage from growing out of bounds. This is an automatic process that the administrator does not have to configure.
Support for any client OS -
The server portion of BackupPC is designed to run on a Unix-style system using Perl and mod_perl under Apache for best performance, but it can be run on any web server that supports Perl and running Perl CGIs. (It does require either mod_perl or setuid Perl.) The server should have a large disk or RAID disk for backup storage. As for clients, almost any Unix or Unix-like OS can be easily backed up. Most modern versions of the commercial Unix variants (Solaris, AIX, IRIX, HP-UX) have tar, compress, gzip, rsync, and rsh and/or ssh either in the base distribution available on the Web. Other Unix operating systems (Linux, FreeBSD, OpenBSD, NetBSD, Mac OS X) also have these tools. Windows clients can be backed up in a few different ways. If the local policy prevents additional software being loaded, BackupPC can use part of the Samba suite (http://www.samba.org) to back up SMB shares on the client. If software can be installed locally, then rsync together with the Cygwin tool set (http://www.cygwin. com) can be used on the client.
Support for native tools -
BackupPC uses standard Unix tools for its tasks. This includes programs such as perl, tar, rsync, compress, gzip, bzip2, zip, apache, and samba. This makes porting the server to a new OS much smoother than trying to port C code. BackupPC does not use a database or catalog to store backup information. Instead, it uses the disk tree to store this information. This means that upgrading the operating system of the BackupPC server (or upgrading the BackupPC application itself) is painless.
User control of backup/restores through web interface -
The Web is the main interface for BackupPC. After the initial configuration, there is no need to have command-line access to the server to administer BackupPC. The web interface is written in Perl and has been designed to run either under mod_perl or normal CGIs running with setuid Perl. The interface allows users to log in and control on-demand backup and restores. The user can request a one-time backup, a full backup, or an incremental backup. If the user needs to recover a file, there are a few options. Individual files can be downloaded simply by selecting them. Groups of files or directories can be restored back in place, or the user can download the files as a tar file or, if configured, as a ZIP file. The user has full control over which files or directories to restore and where to restore them. A history feature displays which files changed during each backup in each directory.
Support for DHCP and disconnected clients -
Since BackupPC's clients are referenced by hostname, if the network being backed up uses DHCP and has dynamic name resolution enabled, nothing further needs to be done for the BackupPC server to back up DHCP clients. If this is not the case, and the clients are Windows machines, BackupPC can be configured to search an address pool for the clients, locating them via their smb hostname. If the client is not online during its normal backup period, the BackupPC server does not generate an error unless a set period of time has elapsed since the last successful backup. At this point, the server emails the owner of the client and reminds him to ensure the machine is on the network for a backup. (The server can also email any errors to the administrator.) Clients that live on a remote LAN can be backed up locally assuming there is network connectivity between the sites. This means that clients connected via Virtual Private Network (VPN) can be backed up. If the user does not want to back up at that point, a trip to the web GUI can cancel the current backup. Clients can also optionally block out times for no backups to permanently fix the issue. BackupPC uses ping's round-trip time to determine whether a client is on a remote network, and won't back up the machine if the round-trip time is longer than a configurable setting.
Backup pooling -
If many clients use the same OS, many duplicated files will be backed up. Keeping multiple full backups increases the number of duplicate files, which increases the storage requirements for the server. BackupPC stores a directory tree per client backup but checks to see whether any file has been stored before from any client. If one has, BackupPC then uses a hard link to point to the existing file in the common disk pool, saving a great deal of space. In addition, BackupPC can optionally use compression to save more space. For example, on a server with nine clients, eight Linux machines, and one Windows 2000 machine, backing up only system configuration and user files, the server has 195 GB backup up before pooling and compression, but disk usage is actually below 40 GB. This is for two full backups and two weeks of daily backups per client. Pooling of common files and compression typically reduce the server's disk storage requirements by factors of six to eight.
Easy per-client configuration -
After the administrator has defined what the site backup policies should be, it is very easy for her to override any configuration option on a per client basis. This allows great flexibility on what, when, and how to back up a client. There are no classes of clients per se, but this can be achieved by symlinking configurations for clients from a master for the "class." |