< Day Day Up > |
Although any number of attributes can provide context to data, the most important from an ILM perspective are
Other attributes may also be important, depending on the organization and its information management needs. In many cases, they will be components of the attributes stated here. The Anatomy of an E-MailA good example to consider is an e-mail object. An e-mail has a number of constituent components that make it an e-mail. First, it can be classified as an e-mail. That may be because a person can recognize it as one or because it has a MIME type of message/rfc822. It has content that can be examined for e-mail formatting and relationships related to an e-mail, such as an attachment. The object may also reside in a directory that is only for e-mails and may have a file format specific to e-mail systems. Finally, there might be state information, such as the time the object was created, headers, or similar descriptors (Figure 8-2). The object is recognized as an e-mail because it has the context of an e-mail. Figure 8-2. Anatomy of an e-mailClassificationClassification is a quick form of identifying what information is. This is something that humans do quite well but machines do not. For ILM, classification is the most important attribute and will drive most actions within an ILM policy. Classes may be broad, such as Financial, Marketing, and Personnel. They may also be very specific, such as First Quarter Financial Reports. If classes are too broad, actions will be limited to only those that can take place among many different types of objects. If classes are too specific, the organization will drown in policy documents. Classifying structured data is easy. The classes are determined by the schema. Unstructured data, on the other hand, can be very difficult to classify. Humans can do this by looking at the data "Yep, that's our third-quarter financial report" but computers are terrible at it. To classify unstructured data, rules-based context is overlaid on the data and stored as metadata. Various attributes of the data are examined to provide a class for the information. The existence of an object in a particular directory or folder, along with keywords found in the content of the object, may be used by a rules-based system to determine its class. Another way to classify unstructured data is through human intervention. When information is created, the person creating the information, or a designated person, can choose a class for it. Even in this case, a set of rules on how to determine a piece of information's class will be needed. Otherwise, classification will be inconsistent and useless. StateState describes content and metadata context at a specific point in time. Changes in some component of the context indicate a change in state. ILM policies may demand that these changes in state trigger actions. The specific metadata that defines state in an ILM system is described by policies. Within ILM policies, state is the catalyst for actions. If a state change occurs, an action, proscribed by policies, must also occur.
Tracking State and HistoryTime is a necessary element of state, even if the timeframe is only now. It is possible to only define a current state, although it is more useful to define state in other timeframes. By tracking state over time, it is possible to accumulate a history of the information. The timeframe "now" defines a current snapshot while other timeframes define history. This is a powerful tool for managing information. By tracking state, it is possible to compare the current state against an expected state. Changes in state will help determine whether the information:
ContentThe important part of any information is its content. Content is the "stuff" of information the words in the document, the numbers in the spreadsheet, and the picture in the image. In a computer system, content is stored as data. Much of the context of information can be derived from the content. By examining a document, clues can be found that help discern whether it is a letter to a friend or a technical manual. Humans are very efficient at performing this task, whereas computers are not. Knowledge management systems have developed very sophisticated inference engines to do what we do naturally. Inference engines examine the content of a document to determine its meaning, usually for purposes of classification. Through the use of statistical analysis and rules-based systems, context can be rendered from the document. These systems are rudimentary compared with what human inferences can do. They often miscategorize information and need human editors to make corrections. Search engines are similar to inference engines in that they scan content for clues as to its meaning. Unlike inference engines, search engines are more of a tool to help humans make content decisions. Often based on keywords, a search engine can provide a list of possible targets. The human then decides whether it meets the criteria for classification. For ILM purposes, humans can do the job of deriving context from content. A person can make the decision as to what the content means. Unfortunately, this is inefficient. It is not too difficult to ask end-users to make a decision as to what the content means for newly created information. It is a daunting job to have people go through existing information and determine context from content.
|
< Day Day Up > |