Data Lifecycle Management (DLM)

 < Day Day Up > 

The twin forces of regulation and cost control have changed the way IT managers look at data. The growing awareness that money is being spent on unimportant data has driven changes in how data is managed. At the same time, regulators and lawmakers throughout the world have burdened organizations with data retention requirements. Failure to comply with these requirements can bring about fines, lawsuits, and even prison terms.

There has always been a sense that old data should be archived or removed. Most organizations had procedures, some formal and some ad hoc, for removing old data from online storage. These common practices have been extrapolated into a formal process called Data Lifecycle Management (DLM). Data Lifecycle Management describes how data is treated at different points in time. The policies for data management change as data ages and changes (Figure 7-1). These policies can then be translated into rules or scripts for applications that automate the policy.

Figure 7-1. General data lifecycle model


As is the case with all policies, each organization must define the lifecycle for its data. There is a general model, however, that most data will follow.

The lifecycle of data is defined by how often the data is accessed. As soon as data is created, it is most useful and used more often. Data created by transaction processing applications and word processors alike has the most value shortly after its creation. At this stage, the data must be kept online and available all the time.

As the data gets older, the need to access it immediately diminishes. Data is still kept online, but guaranteed access time is no longer important. Users can wait some time to get it, if necessary. When the data is older still, the need to keep it online at all decreases until it can be removed from online storage altogether. Finally, when the data is no longer useful or when having it represents a liability to the organization, it is destroyed.

Data Lifecycle Management and Data Protection

Data Lifecycle Management is intertwined with the data protection policies of an organization. Data protection policies must take into account the lifecycle of the data to use resources cost effectively. Otherwise, data that is unimportant will be given high levels of protection, resulting in a higher cost than is necessary. Conversely, data that is extremely important may not have adequate levels of protection due to resource constraints.

Data Lifecycle Management policies also have to take into account data protection policies and systems. If the two are not synchronized, it is likely that the policies will be in conflict. It is possible to comply with a Data Lifecycle Management policy that insists that aged data be moved to less expensive storage while violating data protection policies that say all data must be protected to a high standard. Data protection systems by nature copy data to various locations on a network or off-site. This may conflict with Data Lifecycle Management policies commanding that data be completely destroyed.

By including Data Lifecycle Management as part of data protection policies, conflicts can be averted, and a more cost-effective data protection system can be implemented.

DLM Policies

Data Lifecycle Management policies are similar to data protection policies. The major difference is that the lifecycle of the data is taken into account when moving, destroying, or copying data.

Data Lifecycle Management alters data protection policies in the following ways:

  • Data retention. Policies requiring that certain data be preserved will have limits placed on the length of time that data can be kept around.

  • Data destruction. Data protection policies will be altered to include removing data, which is usually the antithesis of data protection.

  • Different levels of protection. It will be deemed adequate for some data to have different levels of protection from others, even when it is the same type of data. The age of the data is the deciding factor in the level of protection it gets. Some data will receive no protection.

In the Widget Corporation example, the e-mail retention policy insisted that all customer e-mails be protected and available all the time. In a short time, this would lead to a huge e-mail database. The backup database would grow equally large in the same amount of time. Most of the e-mails, however, would be old and close to useless. Widget Corporation would soon be spending money to buy more storage for e-mails that no one needs anymore.

The company has determined that customer e-mails are hardly ever accessed after two years and have no value after three years. The goals of the data protection policy can be amended to read as follows:


All customer and prospect e-mails must be retained for two years. After two
years, the e-mails are to be archived, and after three years, they are to be
destroyed.

The e-mail retention policies can be described in plain language as:


Name: Customer E-Mail Retention and Destruction
Policy Type: E-Mail
Data Type: E-Mail
Parent: E-Mail Policy
Description: Policy governing the retention and destruction of customer e-mail
Purpose: To support continuing business operations by ensuring that previous
e-mail communications with customers are available to Sales, Marketing, and
Customer Service.
Creation Date: MAY 4, 2004
Revision Date: APR 1, 2005
Process:
All e-mails to and from customers and potential customers (also known as
prospects) will be copied to a duplicate copy of the e-mail database as they
are received. End-users are not allowed to delete customer e-mails in any way,
including from their personal mailboxes.
The primary and duplicate e-mail database will be backed up to tape each
night; tapes will be rotated to according to current IT policy (IT Tape Rotation
Policy).
Each month, a survey of the e-mail databases and tapes will be done. All
customer e-mails two years old or older will be copied to DVD-ROM. They will
then be deleted from the primary e-mail database, secondary e-mail database,
and backup tapes. All end-users are expected to delete all copies of customer
e-mails more than two years old each month.
Each month, DVD-ROMs more than a year old will be sent to a shredding
facility and destroyed.
Expected Results: All customer e-mails than older than two years will be
available on DVD-ROM. E-mail older than three years will be destroyed. All
customer e-mail less than two years old will always be available online all the
time.
Constraints: There is no automated end-user e-mail deletion tool. End-users
are expected to find and delete e-mails manually each month.
Assets: primary_email, secondary_email
Asset Type: Disk array
Asset: backup1
Asset Type: Backup server with attached autoloader
Asset: dvd_rom_1
Asset: Type: DVD-ROM jukebox

By including Data Lifecycle Management concepts in the e-mail data protection policy, Widget Corporation does not need to increase the size of the e-mail storage as rapidly, saving money. The most valuable e-mails are given the highest degree of protection, less valuable ones are not.

DLM Automation

The Achilles heel of policy-driven strategies is that they often require changes in human processes. System administrators have to perform certain tasks for the policy to be completed. Users have to follow certain procedures, which makes them behave differently in their daily work. Forcing people to change how they perform normal duties leads to errors in judgment, mistakes, and outright subversion of the process. When a process is inconvenient, it is never followed well or at all.

Automation takes the work out of complying with policies. Users and administrators use software to perform the tasks that comprise the policy. When properly configured, the software does not make mistakes or balk at tiresome tasks.

Data Lifecycle Management automation has two components: the policy engine and the data migration software The policy engine stores and executes the tasks, references, and constraints that express the DLM policy in terms that computer systems can understand. A policy is translated into a series of commands that other components of a system can then perform. The policy engine may translate a policy that states:


Move all files from Finance to secondary storage after they are one year old.

to a command such as the following:

 moveOldFiles //Finance //FinanceBackup -365  day 

How the policy engine's rules are actually executed, as system tasks, depends on the data migration software. Data migration software may be little more than a group of scripts that execute operating system commands. It may also be very sophisticated software capable of moving data around a SAN, LAN, or WAN.

No matter how the data migration software is constructed, its purpose is to migrate from one data store to another. To support DLM fully, data migration software needs to be able to copy, move, and delete data, based on age and physical location. Data migration software also needs to be able to support a variety of media, especially disk, tape, and optical storage such as CD-RW.

Multi-tier Storage Architectures

To control the costs of policy-based data protection, IT organizations have turned to multi-tier storage architectures. Systems based on this architecture organize storage in several tiers or stages, with the most expensive, reliable, and available storage used for the most important data. Progressively less expensive storage is deployed for less important data, with archive systems occupying the lowest tier.

This arrangement, coupled with Data Lifecycle Management software, allows data to be moved from more expensive to less expensive resources as it moves through its lifecycle. The top tier typically is composed of expensive, high-available Fibre Channel disk arrays supported by a full range of data protection systems. The next tier is often disk arrays with SATA drives. These are less expensive yet reliable. SATA systems provide reasonable performance as well. The final tier is filled by archive systems tape, optical disk (CD and DVD), or both (Figure 7-2).

Figure 7-2. multi-tier storage architecture


As data ages, it is moved to the less expensive storage system. This drives down the cost of storing the data. As the data migrates to less expensive systems, the level of protection is reduced as well. At the top tier, extensive and expensive data protection strategies such as remote copy and replication may be used, while at the lowest tier, off-site storage of CDs may be all that is done.

Do not confuse DLM and multi-tier storage. Hardware vendors tend to blur the lines between Data Lifecycle Management and multi-tier storage systems. DLM is a type of policy-based data management. multi-tier storage is hardware architecture. There are many reasons to have multi-tier storage systems that have nothing to do with DLM, and DLM is not dependent on multi-tier storage. They do support each other well, though they are not the same.

Hierarchical Storage Management

Hierarchical Storage Management (HSM) is often used synonymously with both DLM and multi-tier storage. HSM is a data protection strategy for archiving data. As a storage system's capacity begins to reach a threshold, the HSM system will move older data out to archive. As the archive system reaches capacity, very old data is moved to less expensive archive data and eventually is deleted.

This differs from multi-tier storage systems in that it is archive centric. Its purpose is to maintain capacity levels of online storage systems. multi-tier storage systems address online data as well as archives. Data Lifecycle Management is also not centered purely on managing utilization levels. Instead, DLM is used for regulatory reasons and to manage the cost of data protection.

HSM combines aspects of both DLM and multi-tier storage architectures, but is a strategy focused on archive and capacity control.


     < Day Day Up > 


    Data Protection and Information Lifecycle Management
    Data Protection and Information Lifecycle Management
    ISBN: 0131927574
    EAN: 2147483647
    Year: 2005
    Pages: 122

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net