In 2001 businesses captured, recorded, and stored more than an exabyte of data.[26] An exabyte (1,000,000,000,000,000,000 bytes) is a billion gigabytes. This is a million times the total textual content of the U.S. Library of Congress. Most of this data is unstructured, meaning it has little or no detailed schema to describe what it means.
The task of finding meaning in this data falls largely to humans, who are generally not keeping up; there is just too much data to interpret. This chapter is about interpretation: how people interpret the raw data in their environment, and how computers are beginning to help.
We set up a general model of how humans interpret and ascribe meaning to unstructured sensory input, as a basis for how systems and tools will aid us in this process in the future. We survey the state of the market in this area, starting with the simplistic interpretation in features such as Microsoft's Smart Tags, through a number of commercial tools that use much more sophisticated reasoning to impart meaning on unstructured documents.
[26]Peter Lyman et al, "How Much Information," University of Berkeley. Available at http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html.