Section 24.7. Can Open-Source Backup Do the Job? | Backup & Recovery: Inexpensive Backup Solutions for Open Systems

24.7. Can Open-Source Backup Do the Job?

Electronic information is stored in a number of different ways, some of which require special treatment during the backup process. Ignoring these differences can have a number of negative side effects, including:

A significant decrease in backup performance
An even larger decrease in restore performance
An inability to recover the data or system in question

How information is treated depends on how the information is stored. The standard, or most common, way that information is stored is as a file in a filesystem. The most common example of a filesystem is the "C:\" drive on Windows desktops and the "/" on Unix-based systems. Most backup products handle ordinary static files without issue, but some filesystems and datatypes can cause problems and often need to be treated specially. They include:

Very active filesystems
Filesystems with large (more than 1 TB) volumes of data
Filesystems with millions files
Information stored inside databases
Metadata not stored in a filesystem or database
Link file structures and device files
Information stored on NAS-based filesystems
Information stored on SAN-based filesystems

Most commercial backup products have additional features to handle these datatypes, usually at an additional price. This section considers whether the open-source products covered in this book can handle the same challenges.

24.7.1. Very Active Filesystems

The basic assumption of traditional backup software is that a file is not changing while it is being backed up. In the past, IT managers put systems into single-user mode prior to performing the backup to ensure that files were static, or unchanging, during the backup process. IT managers often do not have the luxury of doing this today, so backup application vendors have developed techniques for backing up these types of files.

Constantly changing files present a special challenge to backup and recovery software applicationseven commercial products. In addition, some operating systems and applications can lock files for exclusive use, preventing even backup applications from accessing them. If a file is too active or locked during backup, commercial backup systems use snapshot techniques to ensure that the files are protected. Snapshot technologies present a static view of the filesystem to the backup application. If this particular challenge is present in your Windows environment, you may want to investigate how your open-source product integrates with Windows snapshot services. If this challenge is present in your Unix or Macintosh environment, you may find that you're not able to solve it without moving to a commercial product.

24.7.2. Very Large Filesystems

Very large filesystems present another set of unique backup challenges. Because traditional backup software applications are filesystem-based, each filesystem or drive is backed up separately. While this method works fine with small to moderate-sized filesystems that are dozens or hundreds of gigabytes in size, it does not perform well with filesystems that are greater than 1 TB in size.

The problem is that the speed of the fastest tape drives can push data at a rate of about 200 MB/s (at this writing). If you could supply a stream of data fast enough, you could back up a 1 TB filesystem in approximately two hours. The problem is that you probably couldn't keep up with the 200 MB/s tape drive, and it would take significantly longer than that.

If this particular challenge is present in your world, you may consider near-continuous data protection. A near-continuous data protection backup system uses replication techniques to maintain a copy of the data for backup purposes. Since it never has to do a full backup, it needs a lot fewer resources to run successfully. Since it's disk-based, it's also going to be able to back up and recover as fast (or as slow) as the filesystem you're trying to back up. See Chapter 7 for a discussion of three open-source near-CDP products.

24.7.3. Filesystems with Too Many Files

Filesystems with lots of files also present a set of unique challenges to IT managers. In fact, a 1 GB filesystem with five million files is actually as challenging to back up as a 1 TB system with a few thousand files. Why? Because of the number of operations that must be performed during the backup and restore process.

The problem is even worse on the recovery side, which requires even more steps to complete. For example, the backup application must first tell the operating system that it needs to create a file; it then needs to open that file for writing and transfer the data into that file. After the restore, a series of checks is performed to verify that the data written to the filesystem is the same as the data that was backed up.

Again, each of these operations takes time, and because they are performed in sequence, they can actually bring the restore process to a screeching halt. The speed of the backup device is irrelevant. The fastest disk drive or tape drive in the world is still going to have to sit and wait while each of these operations is performed for each file.

An alternative approach to traditional backup and restore processes is image-based backup, which bypasses the filesystem (and files) and backs up data at the block level. Unfortunately, any open-source project designed to address this challenge probably requires the drive to be unmounted during backup. If this limitation is unacceptable, you have to switch to a commercial product.

24.7.4. Information Stored in Databases

Information stored in databases can be difficult to back up because of theway it is stored, the changing nature of the datafiles, and demanding recoverypointobjectives, or RPOs.

Databases generally store databases in files in the filesystem, but some databases store "raw" data, or datafiles, directly on disk. While storing data on raw disk can improve performance, it can make the backup environment significantly more complex.

Datafiles change constantly; therefore, the challenge is to create a consistent image of the datafile during the backup process. A variety of techniques, including cold backups, scripted hot backups, and database backup agents, can help IT managers create these images.

A cold backup is the backup of datafiles after a database has been shut down. A scripted hot backup places the database in some type of backup mode before backing up its datafiles using a regular backup program. Either method works well with most of the backup utilities covered in this book. A database backup agent interfaces directly with the database for backup purposes. At this writing, none of the open-source backup products support backing up any database using its agent. However, many of the database agents do support backing up to disk without a commercial backup product. For example, Oracle's rman, Sybase and SQL Server's dump database, and ntbackup's Exchange plug-in all support backing up to disk while the database is active. You could therefore create a disk-based backup that is then backed up by the open-source backup system you chose.

Databases generally have more demanding RPOs. While it may be acceptable to restore a word processing file to last night's backup, it is generally not acceptable to restore a database file from a backup copy that is several hours old. Databases provide transaction logs that track changes in between backups. A proper backup of a production database includes a system for backing up the transaction log during the day.

24.7.5. Information Stored on Shared Storage

Information stored on shared storage can also create extra backup requirements. Shared storage comes in two main flavors: SAN and NAS. SANs are based on the SCSI protocol and allow several systems to have block access to shared disk or tape drives. SANs typically run either SCSI over Fibre Channel or IP (iSCSI). NAS is based on NFS or CIFS protocols, which allow multiple servers to share files across an IP network.

24.7.5.1. SAN-based filesystems

The low-cost backup products covered in this book are not going to treat SAN-based filesystems any differently than a locally attached filesystem. If you need a product that performs SAN-based backups, you need a commercial product.

24.7.5.2. NAS-based filesystems

Although the snapshot and off-site replication software offered by some NAS vendors has great recovery features, NAS filers must still be backed up at some point. All of the open-source products discussed in this book are going to back up NAS-based filesystems via a share (NFS or CIFS).