Thus far in this chapter, we have covered maintaining network hardware and data security and keeping computer components safe from harm. However, making networks secure also includes protecting the data from corruption or loss. This lesson presents an overview of the possible causes of data loss and how to protect the network against them. You will learn about systems and processes for preventing data loss.
After this lesson, you will be able to:
- Identify the reasons for implementing a backup system.
- Select a backup approach that is appropriate for a given site, including the method and schedule.
- List the considerations for implementing an uninterruptible power supply.
- Describe each of the following types of fault-tolerant systems: disk striping, disk mirroring, sector sparing, clustering.
Estimated lesson time: 45 minutes
A site disaster is defined as anything that causes you to lose your data. Many large organizations have extensive disaster-recovery plans to maintain operations and rebuild after a natural disaster such as an earthquake or a hurricane. Many, but not all, include a plan to recover the network. However, a network can incur a disastrous failure from many more sources than natural disasters. Disaster recovery for a network goes beyond the replacing of the physical hardware; the data must be protected as well. The causes of a network disaster, ranging from human acts to natural causes, include:
In the event of a site disaster, the downtime spent recovering data from backup storage (if you have backups) could result in a serious loss of productivity. And without backups, the consequences are even more severe, possibly resulting in significant financial losses. There are several ways to prevent or recover from data loss, including:
Any or all of these approaches can be used, depending on how valuable the data is to the organization and on the organization's budget constraints.
The simplest, most inexpensive way to avoid disastrous loss of data is to implement a schedule of periodic backups with storage offsite. Using a tape backup is still one of the few simple and economical ways to ensure that data remains safe and usable.
Experienced network engineers advise that a backup system should be the first line of defense against data loss. A secure backup strategy minimizes the risk of losing data by maintaining a current backup—copies of existing files—so that files can be recovered if harm comes to the original data. To back up data requires:
The equipment usually consists of one or more tape drives and tapes or other mass storage media. Any expense incurred in this area is likely to be minimal compared to the value of what will be saved in the event of data loss.
The rule is simple: if you cannot get along without it, back it up. Whether you back up entire disks, selected directories, or files depends on how fast you will need to be operational after losing important data. Complete backups make restoring disk configurations much easier, but can require multiple tapes if there are large amounts of data. Backing up individual files and directories might require fewer tapes, but could require the administrator to manually restore disk configurations.
Critical data should be backed up according to daily, weekly, or monthly schedules, depending on how critical the data is and how frequently it is updated. It is best to schedule backup operations during periods of low system use. Users should be notified when the backup will be performed so that they will not be using the servers during server backup.
Selecting a Tape Drive
Because the majority of backing up is done with tape drives, the first step is to select a tape drive, weighing the importance of a variety of factors, such as:
Ideally, a tape drive should have more than enough capacity to back up a network's largest server. It should also provide error detection and correction during backup and restore operations.
As listed in Table 10.2, an efficient backup policy uses a combination of methods:
Table 10.2 Backup Methods
|Full backup||Backs up and marks selected files, whether or not they have changed since the last backup.|
|Copy||Backs up all selected files without marking them as being backed up.|
|Incremental backup||Backs up and marks selected files only if they have changed since the last time they were backed up.|
|Daily copy||Backs up only those files that have been modified that day, without marking them as being backed up.|
|Differential backup||Backs up selected files only if they have changed since the last time they were backed up, without marking them as being backed up.|
Tapes can be backed up based on a multiple-week cycle, depending on how many tapes are available. No rigid rules govern the length of the cycle. On the first day of the cycle, the administrator performs a full backup and follows with an incremental backup on succeeding days. When the entire cycle has finished, the process begins again. Another method is to schedule streaming backups throughout the day.
Experienced administrators test the backup system before committing to it. They perform a backup, delete the information, restore the data, and attempt to use the data.
The administrator should test the backup procedures regularly to verify that what is expected to be backed up is actually being backed up. Additionally, the restore procedure should be tested to ensure that important files can be restored quickly.
Ideally, an administrator should make two copies of each tape: one to be kept onsite, and the other stored offsite in a safe place. Remember that although storing tapes in a fireproof safe can keep them from actually burning, the heat from a fire will ruin the data stored on them. After repeated usage, tapes lose the ability to store data. Replace tapes regularly to ensure a good backup.
Maintaining a log of all backups is critical for later file recovery. A copy of the log should be kept with the backup tapes, as well as at the computer site. The log should record the following information:
Tape drives can be connected to a server or a computer, and backups can be initiated from the computer to which the tape drive is attached. If you run backups from a server, backup and restore operations can occur very quickly because the data does not have to travel across the network.
Backing up across the network is the most efficient way to back up multiple systems; however, it creates a great deal of network traffic and slows the network down considerably. Network traffic can also cause performance degradation. This is one reason why it is important to perform backups during periods of low server use.
If multiple servers reside in one location, placing a backup computer on an isolated segment can reduce backup traffic. As shown in Figure 10.8, the backup computer is then connected to a separate NIC on each server.
Figure 10.8 Network traffic is reduced by backing up to a separate segment
An uninterruptible power supply (UPS) is an automated external power supply designed to keep a server or other device running in the event of a power failure. The UPS system takes advantage of uninterruptible power supplies that can interface with an operating system such as Microsoft Windows NT. The standard UPS provides a network with two crucial components:
The power source is usually a battery, but the UPS can also be a gasoline engine running an AC power supply.
If the power fails, users are notified of the failure and warned by the UPS to finish their tasks. The UPS then waits a predetermined amount of time and performs an orderly system shutdown.
A good UPS system will:
As shown in Figure 10.9, the UPS is usually located between the server and a power source.
Figure 10.9 Uninterruptible power supply as a backup power source
If power is restored while the UPS is active, the UPS will notify the users that the power has returned.
The best UPS systems perform online. When the power source fails, the UPS batteries automatically take over. The process is invisible to users.
There are also stand-by UPS systems that start when power fails. These are less expensive than online systems, but are not as reliable.
Answering the following questions will help the network administrator determine which UPS system best fits the needs of the network:
Fault-tolerant systems protect data by duplicating data or placing data in different physical sources, such as different partitions or different disks. Data redundancy allows access to data even if part of the data system fails. Redundancy is a prominent feature common to most fault-tolerant systems.
Fault-tolerant systems should never be used as replacements for regular backup of servers and local hard disks. A carefully planned backup strategy is the best insurance policy for recovering lost or damaged data.
Fault-tolerant systems offer these alternatives for data redundancy:
Fault-tolerance options are standardized and categorized into levels. These levels are known as redundant array of independent disks (RAID), formerly known as redundant array of inexpensive disks. The levels offer various combinations of performance, reliability, and cost.
Level 0—Disk Striping
Disk striping divides data into 64K blocks and spreads it equally in a fixed rate and order among all disks in an array. However, disk striping does not provide any fault tolerance because there is no data redundancy. If any partition in the disk array fails, all data is lost.
A stripe set combines multiple areas of unformatted free space into one large logical drive, distributing data storage across all drives simultaneously. In Windows NT, a stripe set requires at least two physical drives and can use up to 32 physical drives. Stripe sets can combine areas on different types of drives, such as small computer system interface (SCSI), enhanced small device interface (ESDI), and integrated device electronics (IDE) drives.
Figure 10.10 shows three hard disks being used to create a stripe set. In this case, the data consists of 192 K of data. The first 64 K of data is written to a stripe on disk 1, the second 64 K is written to a stripe on disk 2, and the third 64 K is written to the stripe on disk 3.
Figure 10.10 Disk striping combines areas on multiple drives
Disk striping has several advantages: it makes one large partition out of several small partitions, which offers better use of disk space; and multiple disk controllers will result in better performance.
Level 1—Disk Mirroring
Disk mirroring actually duplicates a partition and moves the duplication onto another physical disk. There are always two copies of the data, with each copy on a separate disk. Any partition can be mirrored. This strategy is the simplest way to protect a single disk against failure. Disk mirroring can be considered a form of continual backup because it maintains a fully redundant copy of a partition on another disk.
Duplexing Disk duplexing, as shown in Figure 10.11, consists of a mirrored pair of disks with an additional disk controller on the second drive. This reduces channel traffic and potentially improves performance. Duplexing is intended to protect against disk controller failures as well as media failures.
Figure 10.11 Disk mirroring duplicates a partition on another physical disk
Level 2—Disk Striping with ECC
When a block of data is written, the block is broken up and distributed (interleaved) across all data drives. Error-correction code (ECC) requires a larger amount of disk space than parity-checking methods, discussed under Level 3. Although this method offers marginal improvement in disk utilization, it compares poorly with level 5, discussed later.
Level 3—ECC Stored As Parity
Disk striping with ECC stored as parity is similar to level 2. The term parity refers to an error-checking procedure in which the number of 1s must always be the same—either odd or even—for each group of bits transmitted without error. In this strategy, the ECC method is replaced with a parity-checking scheme that requires only one disk to store parity data.
Level 4—Disk Striping with Large Blocks
This strategy moves away from data interleaving by writing complete blocks of data to each disk in the array. The process is still known as disk striping, but is done with large blocks. A separate check disk is used to store parity information. Each time a write operation occurs, the associated parity information must be read from the check disk and modified. Because of this overhead, the block-interleaving method works better for large block operations than for transaction-based processing.
Level 5—Striping with Parity
Striping with parity is currently the most popular approach to fault-tolerant design. It supports from a minimum of three to a maximum of 32 drives and writes the parity information across all the disks in the array (the entire stripe set). The data and parity information are arranged so that the two are always on different disks.
A parity stripe block exists for each stripe (row) across the disk. The parity stripe block is used to reconstruct data for a failed physical disk. If a single drive fails, enough information is spread across the remaining disks to allow the data to be completely reconstructed.
The parity stripe block is used to reconstruct data for a failed physical disk. A parity stripe block exists for each stripe (row) across the disk. RAID 4 stores the parity stripe block on one physical disk, and RAID 5 distributes parity evenly across all disks.
Level 10—Mirrored Drive Arrays
RAID level 10 mirrors data across two identical RAID 0 drive arrays.
The Windows NT Server operating system offers an additional fault-tolerant feature called "sector sparing," also known as "hot fixing." The three steps of sector sparing are shown in Figure 10.12. This feature automatically adds sector-recovery capabilities to the file system while the computer is running.
Figure 10.12 Sector sparing or hot-fixing steps
If bad sectors are found during disk I/O (input/output), the fault-tolerant driver will attempt to move the data to a good sector and map out the bad sector. If the mapping is successful, the file system is not alerted. It is possible for SCSI devices to perform sector sparing, but ESDI and IDE devices cannot. Some network operating systems, such as Windows NT Server, have a utility that notifies the administrator of all sector failures and of the potential for data loss if the redundant copy also fails.
Microsoft Clustering is Microsoft's implementation of server clustering. The term "clustering" refers to a group of independent systems that work together as a single system. Fault tolerance is built into the clustering technology. Should a system within the cluster fail, the cluster software will disperse the work from the failed system to the remaining systems in the cluster. Clustering is not intended to replace current implementations of fault-tolerant systems, although it does provide an excellent enhancement.
Most advanced network operating systems offer a utility for implementing fault tolerance. In Windows NT Server, for example, the Disk Administrator program is used to configure Windows NT Server fault tolerance. The graphical interface of Disk Administrator makes it easy to configure and manage disk partitioning and fault tolerant options. If you move the disk to a different controller or change its ID, Windows NT will still recognize it as the original disk. Disk Administrator is used to create various disk configurations, including:
The term "optical drive" is a generic term that is applied to several devices. In optical technology, data is stored on a rigid disk by altering the disk's surface with a laser beam.
The use of optical drives and discs is becoming increasingly popular. As the technology evolves from the original read-only and read-write CD-ROMs to the new DVD technologies, these devices are being used more and more to store large amounts of retrievable data. Optical-drive manufacturers provide a large array of storage configurations that are either network-ready or can be used with a network server. They are an excellent choice for permanent backup. Several variations of this technology exist.
Compact discs (CD-ROMs) are the most common form of optical data storage. CD-ROMs, for the most part, only allow information to be read. The advantages of using CDs for storage are many. The ISO 9660 specification defines an international format standard for CD-ROM. Their storage capacity is high—up to 650 MB of data on a 4.72-inch disc. They are portable and replaceable, and because data on a CD-ROM cannot be changed (it is read-only), files cannot be accidentally erased. Standard recording formats and inexpensive readers make CDs ideal for data storage. CD-ROMs are also available in a multisession format called "CD-recordable" (CD-R). This media can now be used for incremental updates and inexpensive duplication. CD-ROMs are also offered in a rewritable format called CD-rewritable.
The digital video disc (DVD) family of formats is replacing the CD-ROM family of formats. Digital video disc technology, also known as "digital versatile disc," is newer and, hence, relatively immature. DVD has five formats: DVD-ROM, DVD-Video, DVD-Audio, DVD-R (the "R" stands for "recordable"), and DVD-RAM. DVD-R is the format for write-once (incremental updates). It specifies 3.95 GB for single-sided discs and 7.9 GB for double-sided discs. DVD-RAM is the format for rewritable discs. It specifies 2.6 GB for single-sided discs and 5.2 GB for double-sided discs, with a disc cartridge as an option. DVD-ROMs (read-only discs) are similar to CD-ROMs and have a storage capacity of 4.7 GB (single-sided, single-layer), 9.4 GB (double-sided, single-layer), 8.5 GB (double-layer, single-sided), 17 GB (dual-layer, double-sided). These are backward-compatible with CD-audio and CD-ROM. DVD-ROM drives can play DVD-R and all the DVD formats. UDF is the file system for DVD-R.
Write once, read many (WORM) technology has helped initiate the document-imaging revolution. WORM uses laser technology to permanently alter sectors of the disc, thereby permanently writing files onto the media. Since this alteration is permanent, the device can write only once to each disc. WORM is typically employed in imaging systems in which the images are static and permanent.
Two new technologies are being employed that utilize rewritable optical technology. These technologies include magneto-optical (MO) and phase change rewritable (PCR) discs. MO drives are more widely accepted because the media and drive manufacturers use the same standards and their products are cross-compatible. PCR devices, however, come from one manufacturer (Matsushita/Panasonic), and the media comes from two manufacturers (Panasonic and Plasmon).
There are two versions of multifunction optical drives. One uses firmware in the drive that first determines whether a disc has been formatted for write-once or rewritable recording and then acts on that disc accordingly. In the other MO version, two entirely different media are used. The rewritable discs are conventional MO discs, but write-once media are traditional WORM media.
Trying to recover from a disaster, regardless of how it was caused, can be a terrifying experience. How successful the recovery is depends on the extent to which the network administrator has implemented disaster prevention and preparedness.
The best way to recover from a disaster is to prevent it from happening in the first place. When implementing disaster prevention:
Not all disasters can be prevented. Every jurisdiction has a disaster-preparedness plan, and many hours are spent every year in practicing for such an event. Because each community is different, recovery plans will have to take different factors into account. If, for example, you live in a flood zone, you should have a plan to protect your network from high water.
When considering disaster protection, you will need a plan for hardware, software, and data. Hardware and software applications and operating systems can be replaced. But to do this, it's necessary first to know exactly what assets you have. Take inventory of all hardware and software, including date of purchase, model, and serial number. (For tips on how to make such an inventory, refer to Chapter 8, Lesson 1: Choosing a Network Design.)
Physical components of a network can be easily replaced and are usually covered by some form of insurance, but data is highly vulnerable to disaster. In the case of a fire, you can replace all the computers and hardware, but not the files, drawings, and specifications for the multimillion dollar project that your organization has been preparing for the last year.
The only protection from a data-loss disaster is to implement one or more of the methods described earlier to back up data. Store your backups in a secure place, such as a bank safe deposit box, away from the network site.
To fully recover from any disaster you will need to:
The following points summarize the main elements of this lesson.
A small organization recently suffered security breaches in its peer-to-peer network. The intruder stole valuable business data. The organization's need for security became apparent, and now a modest-sized, but more secure, server-based network is in place.
The organization is located in a small California community that experiences frequent earthquakes and power outages. Your job is to plan how to avoid breaches of security and plan for disaster recovery at the same time. In this exercise, examine preventive measures the organization can take to avoid data loss due to human activities and natural disasters such as earthquakes.
List the categories of things that can put the organization's data at risk. Discuss the preventive measures and recovery plans appropriate for each kind of data loss.