Replication

 < Day Day Up > 

On the surface, replication would appear to be the same as remote copy. The primary goal is certainly the same: to make a copy of the data at a remote location. Replication, however, does it in a different way. Instead of duplicating every I/O produced by a host at a block level, replication works at the file or application level. It produces copies of the elements of structured or unstructured objects at a predetermined time interval. The disk image may not be exactly the same as it is with remote copy, but the important data will be duplicated

Replication is usually host based. There are replication servers and appliances, but these depend on either host-based agents or the ability to mount a file system, volume, or database remotely. This is not surprising, because replication requires that the software have intimate knowledge of the structures within the application or file system objects whose elements are being copied. Software companies produce third-party replication software, but it is often a feature or add-on to database and e-mail applications.

Although replication is used for data protection, it is also a management tool used to duplicate key information to a variety of data stores. If a company has two call centers using the same Customer Relationship Management (CRM) software, for example, performance is better when each site operates from a local copy of the underlying database. Caller information generated at one center would need to be duplicated to the other center in case a call is routed to it (Figure 4-7). Database replication would be used to ensure that the call records from each center are duplicated in the other center.

Figure 4-7. Replication


The advantages of replication over remote copy are cost and the amount of bandwidth required. The cost of the software or add-on is usually much lower than for a remote copy system. Specialized hardware is rarely needed for replication, further reducing the initial costs. The overall cost of replication is also lower, because much less expensive telecommunications links can be used. T-1/E-1 WAN links (with a maximum of 1.55 megabits per second) can be used effectively for replication but rarely for remote copy.

Replication has its limitations from a data protection point of view. It duplicates information only for a specific file system or application to a similar file system or application. Data is copied, but the positions of elements, supporting metadata, and system files are not duplicated. When database rows are copied, for example, the row may be in a different place in the database from the original. The "remote" database may also have rows in it that are not replicated to the first one. Replication is less complete than remote copy, and failover is less automatic.

Replication does not guarantee timeliness. Whereas remote copy copies data to two places simultaneously, replication waits until the initial transaction occurs and then copies the data to the remote object. For some amount of time, the two objects will be different and the data not the same. There is the potential for data to be unprotected for a period, because the objects are not fully synchronized. A failure during replication may mean that the two data stores are out of sync. Although most replication software allows for a complete resynchronization of the replicated system, it is time-consuming and often requires the systems to be offline.

Event-Driven Versus Timed Replication

Replication usually occurs in one of two ways: event-driven or timed. Event-driven replication happens when a specific action occurs that triggers the replication of data. Each time a new customer row is created in a database, for example, that row is copied to the remote database.

Timed replication happens at intervals specified by the system administrator. Data may be replicated once a day or every few minutes. It can be configured to happen during off-peak periods of network usage. This saves the costs of having to lease additional long-distance network links. It affords less protection, though. If new data in the primary object is destroyed during the period between updates, it is gone forever.


Database and E-Mail Replication

Databases and e-mail systems are two common applications that use replication. Because most e-mail systems are backed by some type of database, the mechanisms are the same. Application objects such as rows, columns, tables, and sometimes code elements such as stored procedures are copied to another database.

All major Relational Database Management Systems (RDBMs) including Oracle, SQL Server, and DB2 support replication as a feature of the database server. Open-source databases, such as PostgreSQL and MySQL, have similar replication capabilities.

Most databases support several types of replication. One-way replication, sometime called snapshot replication, is when database objects are duplicated in the remote database only. This is fine for data protection. If the goals of the replication strategy also include sharing data among facilities, one-way replication is limited. One-way replication is less bandwidth intensive and takes much less time to perform. Two-way replication will synchronize objects between two databases. When the replication is complete, the two databases will have the same objects. Although it is a more complete form of replication, it takes more bandwidth and additional time to perform.

Databases support replication through special SQL commands, such as Oracle's Create Snapshot, external processes, or both. This makes native replication a nonportable process that is done differently from database to database. It also means that replicating from one vendor's database to another is impossible using the native tools.

Luckily, some third-party replication tools can copy data between disparate databases. These software products use a database's Application Programming Interface (API) to interact with the elements of different types of databases. This allows them to perform replication between different database systems.

File Replication

File replication copies files from one file system volume or directory to another. This is a popular method of protecting data as well as distributing files among servers. A software add-on product monitors changes in the file system and copies changed or new files to the remote file system. Although some replication software uses proprietary protocols, most use CIFS or NFS as the data transport protocol.

File replication tools need to understand the underlying file system and are OS specific. There are a variety of ways that the software can be designed, but all need some type of host-based agent to monitor the file system. Host-based agents either use low-level system calls or are built into the operating system kernel.

NAS systems often have replication features built into them. Many server operating systems also have some form of rudimentary replication as well, often as part of clustering software features. File replication is most popular with file-oriented systems such as file servers and web server farms.

     < Day Day Up > 


    Data Protection and Information Lifecycle Management
    Data Protection and Information Lifecycle Management
    ISBN: 0131927574
    EAN: 2147483647
    Year: 2005
    Pages: 122

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net