| < Day Day Up > |
There are many factors that affect back-up. For example:
Storage costs are
Systems have to be on-line continuously.
The role of bBack-up has changed.
The cost per MB of primary (on-line) storage has
Seven/twenty-four (7 x 24) operations have become the norm in many of today’s businesses. The amount of data that has to be kept on-line and available (operationally ready data), is very large and constantly increasing. Higher and higher levels of fault tolerance for the primary data repository are a growing requirement. Because systems must be continuously on-line, the dilemma becomes that you can no longer take files off-line long enough to perform back-up.
It’s no longer just about restoring data. Operationally, ready or
mirrored
data does not guard against data corruption and user error. The role of back-up is now taking on the responsibility for recovering
Current solutions
An alternative to tape back-up is to physically replicate or mirror all data and keep two copies on-line at all times. Because the cost of primary storage is falling, this as not as cost-prohibitive as it once was. The advantage is that the data does not have to be restored, so there are no issues with immediate data availability. There are, however, several drawbacks to all the back-up and data availability solutions on the market today.
Network back-up creates network performance problems. Using the production network to carry back-up data, as well as for normal user data access can severely overburden today’s busy network resources. Installing a separate network exclusively for back-ups can minimize this problem, but even dedicated back-up networks may become performance bottlenecks.
Off-line back-up affects data accessibility. Host processors must be quiescent during the back-up. Back-up is not host-independent, nor is it nondisruptive to normal data access. Therefore, the time that the host is off-line for data back-up must be minimized. This requires extremely high-speed, continuous parallel back-up of the raw image of the data. Even in doing this, you have only deferred the real problem, which is the time to restore the information. Restoration of data needs to occur at the file level, not the full raw image, so that the most critical information can be brought back into operation first.
Live back-ups allow data access during the back-up process, but affect performance. Many database vendors offer
live
back-up features. The downside to the live back-up is that it puts a tremendous
Mirroring doesn’t protect against user error and replication of bad data. Fully replicated on-line data sounds great, albeit at twice the cost per megabyte of a single copy of on-line data. But synchronizing, breaking, and resynchronizing mirrors is not a trivial process and influences data access speeds while these activities are occurring. Also, duplicating data after a user has deleted a critical file or making a mirrored copy of a file that has been corrupted by a host process doesn’t help. Mirroring has its place in back-up/recovery, but cannot solve the problem by itself.
Back-up at extremely high speed, with host-processor independence of the underlying file structures supporting the data, is required. Recovery must be available at the file level. The time that systems are off-line for back-up must be eliminated.
Mirroring, or live data replication for
hot
recovery also has a role. For data that must be always available, highly fault-tolerant primary storage is not enough, nor is a
To achieve effective back-up and recovery, the decoupling of data from its storage space is needed. Just as programs must be decoupled from the memory in which they’re executed, the stored information itself must be made independent of the storage area it occupies.
It is necessary to develop techniques to journal modified pages, so that journaling can be invoked within the primary storage device, without host intervention. Two separate pipes for file access must be created: one pipe active and the other dynamic. The primary storage device must
Part of the primary storage area must be set aside for data to be
Mechanisms must be put in place to allow for the back-up of data to occur directly from the primary storage area to the back-up area without host intervention. Host CPU bottlenecks and network bottlenecks are then eliminated. The net result will be faster user response times during live back-up, normal network performance levels throughout the process, and no back-up downtime.
What about restore times? Fast, nonrandom restoration of critical data assumes that the user can select at the file level exactly which information comes back on-line first. Here again, the primary storage and its back-up software must offload that burden from the host(s) and take on the responsibility for understanding the underlying file structures of multiple heterogeneous
How achievable is this scenario? Many back-up tools are available today. What have been missing are architectures that can support journaling within the primary storage area, to enable direct,
live
back-ups with high-speed file-level restores. A few storage vendors, mostly in the mainframe arena, are providing some of these types of solutions. Now vendors such as Storage Computer, with
Virtual Storage Architecture
, provide an underlying structure to enable these back-up features in
| < Day Day Up > |