Automating ILM

 < Day Day Up > 

It is possible to implement ILM without any software or hardware at all. ILM is, first and foremost, a process. It is a very difficult process to implement without tools, however. Tracking information and complying with ILM policies sound good until an organization realizes how tough it is actually to monitor state changes. The classification process can be nearly impossible without some tools to categorize existing information. So what if the policies say that certain actions are supposed to happen when a state change occurs? If the change can't be detected and tracked, it might as well have not happened. If there is no way to audit changes in information, how can the organization know whether it is complying with its ILM policies? ILM is too complex unless there are tools to assist in classifying, auditing, and moving information assets.

Electronic information is stored on data storage systems. No particular storage architecture is required for ILM. Instead, the ILM automation tools need to fit the overall storage architecture. If the predominant storage devices are file servers, automation needs to be designed for a file-based environment. The same is true for networked storage in a SAN or NAS environment. ILM automation is done with software. The hardware matters only in that it supports the needed software.

The areas in which automation can help are

  • Classification. Determining class from content and metadata, especially for existing, unstructured information.

  • Auditing and tracking. Ensuring compliance with policies by tracking state changes and saving them as history.

  • Decision making. Automating decisions such as whether a state change has occurred and what actions to take based on that change.

  • Moving and copying. Shifting or duplicating information to different information paths based on policy.

  • Access control. Ensuring compliance with policies by limiting the ability to change or view information.

ILM automation is still in its infancy, as is ILM. Some areas are well addressed by products; others are just emerging. Purely storage-oriented technologies, such as information movers, content addressable storage, and access control systems, are more developed, often because they were adapted from existing products. Other technologies, such as classification tools and ILM auditing, are still very early in their product lifecycles (Figure 8-6).

Figure 8-6. ILM automation technology


Policy Engines

What makes a software system an ILM tool is a policy engine. As is the case with Data Lifecycle Management, a policy engine is needed to drive the ILM automation process. Considering the fragmented nature of ILM technology today, several policy engines will be needed in a complete ILM automated solution. Unfortunately for the IT manager, this means having to manage duplicate ILM policies in different software systems.

The policy engine is designed to make the software behave in accordance with the ILM processes called out by the ILM policies. There are several data movers out there. To be considered an ILM mover, an ILM policy engine has to be directing the moves and updating the state history.

Many vendors have created ILM automation software by grafting policy engines onto their existing software. E-mail archiving products, document management, and records management tools have been converted to ILM automation tools by adding policy engines.

Search and Classification Engines

Search engines have been around the Internet since the early 1990s. Most users of the Internet see the client side of the search engine. Type in some keywords or a phrase, and up pops a list of possible web sites. Hidden from view are the sophisticated processes that search out the Internet, catalog the words in the web pages, and categorize the pages based on rules.

These search tools are now being adapted for desktop and enterprise storage systems. The engine scans a storage unit and catalogs the files on it, as well as gathers metadata. This database is then used to find information based on content and metadata. Major search engine companies, such as Yahoo!, Google, and Microsoft, have all released desktop search engine products. Enterprise-strength search engines from Copernic and X1 (which supplied the desktop product to Yahoo!) are available as well.

Although not an ILM tool per se, this type of software can be adapted to develop a classification engine. A classification engine scans information in a system, applies rules, and assigns a class to it. When ILM rules are applied to the metadata, classes of data can be derived from the existing base of metadata. Many commercial search engine products have APIs that expose the gathered metadata and could be developed into a classification engine.

Rudimentary classification engines are also part of other ILM software. Policy engines often include some form of classification scanning to accommodate existing information.

ILM Auditing and Tracking

A large part of the ILM process is designed to ensure that the information in the organization is what it is supposed to be and where it is expected to be. In structured data systems such as databases, auditing the information can be accomplished through the use of transaction logs. File systems, on the other hand, often do not track changes in information. Even in cases in which metadata tracks changes, such as whether an object has been accessed, file systems do not monitor changes in content. Opening and closing a word processor file will change metadata fields. If nothing was done to the file's information, no real change has occurred in the information. Software used to track changes in information, including content and audit reports, based on ILM policies, is an emerging technology.

Content Addressed Storage

Content Addressed Storage, also called Content Aware Storage and CAS, is a specialized storage device that locates and manages information based on its content. As information is stored on a CAS array, a hash of the content is created and stored along with the information object. It then prevents changes to the information's content. Some systems will automatically version information if the content changes.

The advantage of CAS systems is that they prevent undetected changes to content. CAS is mostly used for fixed content, which is content that is not supposed to change. ILM policies stating that information can never be changed until it is destroyed, or insisting on versioning changes to information, benefit from storage on CAS arrays. Digitized images such as check and x-ray images are popular targets for CAS usage. Rapidly changing information like that found in databases is not a good candidate for CAS systems. CAS assumes that the content will not change and wards off any changes.

CAS also fits in well with ILM because the information is accessed based on content, not on filenames or other artificial constructs. Because ILM information paths are not dependent on any file-naming scheme, the CAS namespace assures a unique information path for all information.

Information Movers

One of the most common actions of an ILM policy is to move information. As the value of information declines, less expensive resources are used to house and protect the information. Something needs to move it there. Many system administrators accomplish this through simple scripts. Unfortunately, scripts tend to be static and must be rewritten if policies change. Software that does this automatically, when ILM state changes dictate an action, ensures that moves happen when they should. Some information movers are DLM systems with an ILM policy engine embedded in them.

     < Day Day Up > 


    Data Protection and Information Lifecycle Management
    Data Protection and Information Lifecycle Management
    ISBN: 0131927574
    EAN: 2147483647
    Year: 2005
    Pages: 122

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net