What is information? We know it when we see it—text, documents, pictures, audio, video, and similar entities. We also know that its purpose is, roughly speaking, to inform us. What does that mean? Generally we are informed when we are affected in some concrete way—taking action, making a decision, seeking further information, or experiencing emotion are all examples. Or, as a result of acquiring information, we may simply develop knowledge—familiarity, awareness, or understanding.
This is a human perspective, and after all, the sole rationale for information is its relevance to people. Within IT a more arcane and technical definition of information is appropriate. Information consists of recognizable patterns, like text, drawings, pictures, audio, or video, normally patterns that are conveyed to our consciousness by our six senses (hearing, sight, smell, touch, taste, and equilibrium). Considering that our senses are physical in nature, conveying information to people requires a physical medium, such as sound (as a pressure wave in the air), an image (a two-dimensional light intensity field), or video (a sequence of images typically accompanied by audio). Inevitably, the interaction between IT and human users is through these physical media, in both directions (see figure 2.1). Acquiring information from the physical world requires a sensor (like a microphone or camera), and conveying information to people requires a transducer (like an earphone, speaker, or monitor).
Figure 2.1: Information techology (IT) uses sensors and transducers to convey information to and from the material world.
Within IT information is represented in a way that it can be stored, communicated, and manipulated conveniently and cost effectively by available electronics and photonics technologies. This internal representation is usually quite different from the original form of the information as created or sensed by people.
Example People experience sound as a pressure wave in the air, but there is no available technology to store sound in that same physical form. Thus, a different representation must be used within IT. This representation has changed over the years as technology has changed. For example, the storage of sound has migrated from vinyl records (the most immediate representation of a pressure wave), to magnetic tape (still analog), to compact disks (digital representation on a specialized medium), to MP3 files (digital representation on any storage medium) stored in a computer.
In IT now and for the foreseeable future, the dominant representation of all forms of information utilizes bits (short for "binary digits"). A bit assumes one of two values: 0 or 1. Bits are ideal because they can be stored simply by a switch ("off" or "on"). A collection of bits representing information is known as data. It is remarkable, and perhaps not altogether obvious, that most forms of information that we commonly encounter can be represented in this way and thus can be captured in the IT infrastructure. This includes documents, pictures, audio, and video.
Example A simple representation of an image is to define a square array of pixels, each pixel having a value representing the intensity of light in a small partition of the image (see figure 2.2). The value of the pixel can in turn be represented by n bits, where n is typically 8 (for monochrome) to 24 (for color). These n bits can represent 2n distinct intensities (28 = 256 to 224 = 16,777,216). The image in figure 2.2 is 200 pixels tall by 300 pixels wide, and the intensity of each pixel is represented by 8 bits, so 480,000 bits total are required. The number of bits can be reduced using compression, which eliminates visually unimportant redundancies in the data.
Figure 2.2: An image can be represented by square pixels, the intensity of each represented by eight bits.
We have used the term representation without defining it. In a representation, information is temporarily replaced by data for purposes of storage, manipulation, and communication, in such a manner that it can later be recovered in its original form, at least in an adequate approximation.
Example Sound can be captured by an audio sensor (microphone), converted to an internal representation as an MP3 file (which contains a large collection of bits) for purposes of storage, and later recovered by a computer driving an audio transducer (speaker). MP3 is a specific representation, one that is particularly apropos for today's IT. Often (as in this example) the representation only yields an approximation to the original, hopefully good enough so that the user can't tell the difference. The more data used to represent, the more accurate the approximation.
The term digital is often used to describe this data representation of information. Actually digital describes a representation using any discrete alphabet, where bits are the specific case of a binary alphabet.
Example Nature uses the DNA molecule as its representation for conveying information about the composition and processes of an organism from one generation to another. The information is conveyed by sequences of amino acids chosen from four possibilities: adenine (A), thymine (T), cytosine (C), and guanine (G). This representation is digital because it is discrete, but it uses an alphabet with four elements rather than two. Nature could have used bits, for example, 00 for A, 01 for T, 10 for C, 11 for G.
Digital representation is distinct from analog representation, in which information is represented by continuous quantities (like the pressure of a sound wave, which is continuous in both time and amplitude). A digital representation (which is always expressible using bits) is advantageous for several reasons. The processing, storage, and communication of bits are easily and cheaply realized by the available electronics and communication technologies. A uniform digital representation allows all forms of information to be freely mixed within a common infrastructure. Most of the infrastructure is not even concerned with what those bits represent.
Example In an operating system, a file is a collection of data. The representation of the data is preserved by some means (for example, a file extension like .doc or .jpg), so that an application can know how to interpret the file. The operating system itself ignores the representation as the file is stored and retrieved from disk and transported across the network. Files using different representations can be freely mixed in all these contexts.
Another advantage of a digital representation has profound implications for the economic properties of information, and even raises complex policy issues.
An important distinction is between the copying and replication of information. Copying can be approximate, while replication is exact. This is where analog and digital representations differ in a fundamental way. Copying an anlog representation inevitably adds noise and distortion, and once introduced, they cannot be removed. Physical media also introduce similar impairments to a digital representation, but because the representation is discrete it is usually possible to remove any impairment as long as the original bits can be detected accurately. On the other hand, the original conversion of an analog representation to digital involves some quantization impairment, although that can be rendered imperceptible.
Example When sound is captured and represented by an MP3 file, the audio that is later played back is a copy of the original: its pressure wave is a good enough approximation so that people cannot distinguish it from the original. That MP3 file can be replicated by simply copying the bits in the file to a new file. In that process, if the bits are detected accurately in the face of any noise and distortion, a representation results that is identical to the original. Each such replica, when converted back to sound, will be virtually identical. A replica can itself be replicated and still remain faithful to the original. In contrast, analog representations (like cassette magnetic tapes) can only be imperfectly copied—a copy of a copy of a copy begins to suffer increasingly significant impairments through accumulated noise and distortion.
Each time a replica is made of a replica, it is called a new generation (see figure 2.3), and it is identical to the first generation. In contrast, with an analog representation slightly imperfect copies will add imperfections to the previous generation, and later generations will be noticeably and increasingly impaired.
Figure 2.3: Successive generations result from replication or copying.
In principle, information can be accurately preserved over time—even very long periods of time, like thousands of years—by periodically making a replica of information stored on a physically deteriorating medium onto a new medium. This replication is called regeneration. However, there is an important caveat here. Regenerated replicas are only data—to recover the information requires knowledge of the representation.
Example A document authored in Microsoft Word (.doc file) can be preserved over time by periodic regeneration, but its interpretation (like viewing or printing) will require a compatible version of Word. Will Word 3000 be able to read Word 2000 .doc files? If we preserve a replica of Word 2000, also by periodic regeneration, will it run on Windows 3000? One response to these issues is the standardization of representations (see chapter 7).
Regeneration is a technical underpinning of networks. Communication over long distances is fundamentally noisy and unreliable, but the delivery of an accurate replica of data is still possible because of periodic regeneration.
Regeneration is a double-edged sword. It reduces the manufacturing and distribution costs for information suppliers relative to analog representations. It also enables piracy, the unauthorized mass replication and sale of information that violates its creator's intellectual property rights. This poses considerable challenges for policymakers, lawyers, and law enforcers (see chapter 8). This problem is considerably less acute for analog because successive generations are progressively more impaired.
Important economic characteristics of digitally represented information follow from these technical characteristics (Shapiro and Varian 1999b). The replication, storage, and communication of data are inexpensive, so digital information has supply economies of scale: fixed creation costs that are typically much larger than the ongoing manufacturing and distribution costs. (Highly volatile information like stock prices is different.) Digital information is nonrival in use: an arbitrary number of users can be given replicas of the information, equivalent in every way, without interference. One user can create a replica and pass it to another user, absent technological obstacles (like copy protection; see chapter 8). This is in contrast to many material goods, where a user would have difficulty in creating replicas.
All information (digital or not) is an experience good: its quality and utility is best judged by experiencing it. This puts a premium on the recommender: someone who applies judgment or expertise to advise on the quality or utility of information. Alternative mechanisms can familiarize users with information, such as free trials or samples.
IT encourages the creation and dissemination of a prodigious account of information. Users can experience information overload, but fortunately IT also provides valuable tools to deal with this, including search engines and indexes that catalog the content of large information repositories. Creators of information can also assist with indexes (lists of available topics), hyperlinks (pointers to related information), and metadata (descriptions of information content). Information suppliers face commensurate challenges in attracting the attention of consumers in the face of the large mass of available information.
Social approaches involving users and enabled by the Internet can be helpful, too. Users can publish reviews on an information supplier's site, and other users can rate those reviewers. This is an example of a recommender system (Resnick and Varian 1997), a software application that aggregates and analyzes the recommendations of many users. A viral approach encourages users to notify other users of valuable information.
The dictionary defines transducer as a device that converts energy from one form to another. Strictly speaking, a sensor is a transducer, too (it converts from mechanical or optical energy to electrical); however, engineers generally reserve transducer for the conversion from electrical energy. The specific case of a transducer that converts to mechanical energy is an actuator.
The usage of these terms is somewhat variable and inconsistent. For example, data is also commonly applied to information, such as that acquired in a scientific experiment, that has been subject to minimum interpretation.
Information conveyed through touch, taste, smell, and equilibrium may be difficult (although not impossible) to represent digitally. The observation that most information can be represented by data and an understanding of the implications of this are generally attributed to Claude Shannon, the originator of information theory. He developed a powerful mathematical theory on the number of bits required to represent information as well as the capability of communication channels to convey it (Shannon and Weaver 1949). One of the classic results of this theory is that a collection of bits can serve as a representation of information in common forms with an arbitrary accuracy (as more bits are used) and no loss of generality.
In contrast, the form of representation is necessarily much more variable in an analog medium. For example, with magnetic tape as an analog storage medium, the way in which audio and video are represented are necessarily quite different. The video requires special synchronization pulses to distinguish the start and end of individual images, something not required for audio.
This terminology is often applied loosely in practice. For example, in a computer system, replicating a file might be called copying. Digital representations of analog media (like audio or video) might be replicated (making a replica of the digital representation) or copied (converted to analog, copied in analog form, and reconverted to digital). These nuances become extremely important in digital rights management (see chapter 8).