The Importance of Context | Data Protection and Information Lifecycle Management

< Day Day Up >

Data, by itself, is not very useful. Look at a single number on a page and it tells you very little. Add other numbers and symbols to form an equation, and now there is some meaning, if you know how to read the formula. Combine the formula with text that explains the formula and now there is useful information. The text and the symbols provide meaning to the numbers.

Information differs from data in that it has context. Context is other data that imparts meaning and structure to the data. For data to be useful, for it to be information, there needs to be context around it. Context acts as a catalyst, converting raw data to useful information (Figure 8-1).

Figure 8-1. Context transforms data into information.

Different Types of Context

There are several forms of context that can be applied to data. Explicit context is context that is stated directly. It exists when data has a predetermined and externally readable structure. Databases have explicit context. Their schemas are an inherent part of the overall data set, and any application that can read the database tables can also understand the meaning of the data.

Implicit context is the context that is implied by attributes of the data. A file with a .doc extension implies a document. If that file is in a directory or folder titled Marketing Plans, it is implied that the file is part of the organization's marketing planning. Clues in the document also hint at the context of the data. Specific formatting, such as letter format, titles, and formulas, provide evidence as to the meaning of the content.

Finally, there is rules-based context. Rules are a way of making implicit context explicit. Rules-based context imposes context on data based on a set of external rules. The following rule illustrates how it is possible to express rules-based context:

If the file has an extension of .doc, is in the Financials folder, and is dated after the first of the year, it is a year-to-date financial report.

No matter what is actually in the files, no matter what the internal structure of the file, it is now considered to be financial information in report form. Information with explicit context carries the context with it, and implicit context derives only from context based on the data. Information with rules-based context imposes the structure externally without regard to the content of the data.

There are pros and cons to each type of context. Explicit context is easier for software to deal with. Because the context is embedded in the data structures, a computer can read it and know how to process the information. It does not always work well for all types of information. An order can easily be depicted in a database or as a structured object because it has predetermined components, but a letter cannot because it is freeform in nature. Parts of a letter can be given explicit context, such as the address block and signature line, but the body the most important part of the letter cannot. There is enough context to know that it is a letter but not enough to know what the letter is about.

Implicit context is difficult to impossible for software to understand. Although research in natural language processing continues, humans are by far the best tool for determining context from content. We can look at unformatted text and tell whether we are looking at a marketing plan or a letter to a friend. Computers cannot do that well.

Rules-based context strikes a middle posture between explicit and implicit context. Almost any type of information can be described by a series of rules. There will be mistakes, however. If the rules are broad, information will be categorized incorrectly.

Context is what ILM leverages to make better decisions than Data Lifecycle Management. By providing a deeper understanding of what the data represents, context allows policies to be developed that better describe what to do with the data. Context converts the raw data to information, which enables ILM.

Metadata

The context of data is derived from other data. Data that describes other data is called metadata. Metadata is used frequently in computing. File systems, for example, use metadata to describe objects and the attributes of those objects. Humans use metadata all the time. We call them clues or hints and process them without even being aware of it.

ILM uses metadata to describe and attribute meaning to data. Rules for interpreting the metadata transform data into information that can then be managed. Data plus metadata plus rules equals information.

Characteristics of Information

Information has several characteristics that are important to ILM. The most important characteristics are as follows:

Context. Context is additional data that provides meaning to the data.
Relationships. Information often includes relationships with other information. Sometimes it's only a casual reference. At other times it's a strong, formal link, such as a hyperlink.
Application independence. Data relies on applications for interpretation; information stands by itself. Different applications using the same data can interpret it in different ways. Information is interpreted the same way, no matter what application is using it. A printed book and an e-book are still the same information.
Determinable value. The value of information can be determined, because it has meaning.

The lifecycle of information is based on context, is affected by the lifecycles of other information, is independent of the applications that use the information, and changes along with the value of the data. Whereas DLM is a function of age, ILM is determined by context and value, of which age is a component.

< Day Day Up >