Sound in Digital Form

So far we've been talking about sound as it exists in the real world, but most important to us is learning how to manipulate that sound digitally. To translate sound to digital form, or to translate digital information back into sound, we need the assistance of transducers . A transducer is a device that translates one kind of energy into another.

Transducers

A device is a transducer if it converts any kind of energy into another form. A thermometer is a simple transducer. It transforms heat into a visual display such as a moving needle. An audio transducer transforms sound waves in the air into some other form, such as an electrical signal. Your ears are themselves audio transducers, transforming sound into nerve impulses that can be sent to the brain.

Early audio transducer technology, like Edison's 1877 tinfoil phonograph, converted sound energy into mechanical energy. These devices directly translated the vibrations in air into grooves in a foil-wrapped cylinder. Inserting a stylus into the groove on the cylinder and attaching the stylus to a megaphone-type horn reversed the direction of the transducer, translating the grooves back into sound.

Today's most common audio transducersmicrophones, headphones, and speakersconvert sound into electrical energy or vice versa. Instead of working with sound vibrations to cut grooves directly in a storage medium as Edison's device did, vibrations are translated into electrical signals by intermediate devices such as moving magnetic coils. (See Chapter 6 for details on exactly how microphones do this.) Once in electrical form, the signals can be used for direct electrical processing in sound hardware, they can be stored magnetically (via magnetized metal particles on audio tape), and, via specialized circuits, they can be converted into numerical data that can be transmitted digitally to computers and other digital devices. In the opposite direction, electrical signals can be converted mechanically into acoustical vibrations; for example, an amplified electrical voltage will drive the cone of a loudspeaker in and out, producing sound.

Analog and Digital Audio

Once sound has been transformed into an electrical signal by a microphone, various devices can transmit and store a representation of the sound in either analog or digital form. Even though the title of this book is Real World Digital Audio , not Real World Analog Audio , you'll use both digital and analog devices in your digital studio, especially since microphones, headphones, and speakers all require conversion of digital audio to analog electrical signals.

Analog and digital equipment both work with transducers that convert sound to voltage and vice versa. Analog circuitry transmits this voltage as a continuous signal that directly represents sound. As with the grooves on a record, any small variation is directly equivalent to the fluctuations in air pressure that constitute sound. Digital circuitry starts with an analog voltage stage, but converts this signal into a numeric form that represents a series of snapshots of the original continuous voltage level. The digital signals are still transmitted as voltages, but minor changes in voltage don't matter, because the signal is encoded and interpreted as a series of zeros and ones (binary numbers ). Let's look at each of these in turn to better understand what that means.

How analog audio works

Something is an analog of something else if the two are comparable in some way. The term analog in relation to audio refers to a continuously varying electrical signal that represents the original variations in pressure of a sound. In reality, the signal is an approximate, not perfect, representation of the original source because of the limitations of real-world transducersin this case, microphonesbut the variations in voltage (the analog signal) will look approximately like the original variations in air pressure (the sound) ( Figure 1.16 ).

Figure 1.16. An analog electrical signal represents the continuous variation of air pressure in real-world sound (left) as a continuous variation in voltage level (right).

Analog audio signals are employed by a variety of devices, such as:

Microphones and speakers (including those with digital converters; fundamentally, a mic or speaker is still an analog device)
Standard audio connections, including headphone jacks , the RCA phono plugs on a stereo, and ¼" TRS connectors
Analog musical instruments, such as analog synthesizers and electric guitars, and their amps
Analog audio processors, including spring reverbs and analog effect units
Analog recording equipment, including turntables and cassette and reel-to-reel tape decks

How digital audio works

Since analog electrical audio signals are neatly comparable to the original sound, why would you ever need anything else? The answer is, analog audio has its downsides. First, because it represents sound using a continuously fluctuating voltage level, any variation in that levellike small amounts of electrical interferencewill be heard when the electrical signal is passed through another transducer (in this case a loudspeaker) and once more becomes sound. In other words, analog signals are susceptible to noise and loss of quality when being transmitted and copied . Second, you're limited in the ways you can process analog signals. You can't take advantage of the many capabilities of the microprocessors in your computer and other hardware. Since computers are essentially very sophisticated arithmetic machines, you can't use computers to work with sound unless you can convert analog audio signals into numbers. The solution to all of these issues is to use a stream of numbers to represent the analog signal instead of using the analog signal directly.

Any audio system that uses numbers to store, process, and transmit data is called digital . The word digital refers to digits, or fingers, because the simplest way to convert observed phenomena to number values is to count on your fingers. To convert the continuous range of analog voltage into numeric form, digital equipment uses a device called an analog-to-digital (A/D) converter .

To create discrete numerical values, the converter measures the voltage level at regular time intervals, a process called sampling ( Figure 1.17 ). You can see a representation of digital samples up close in audio software that has sample-level zooming, as shown in Figure 1.18 .

Figure 1.17. A digital converter translates a continuous analog voltage into discrete, numerical values. The input for the converter is an analog electrical signal, which has a continuously variable level analogous to the variations in air pressure in the original sound (1). The converter is capable of translating this data into a "grid" of possible, discrete numbers: on the x-axis, time, a certain number of snapshots per second, and on the y-axis, signal strength, a certain number of numerical values for each of those snapshots (2). The digital converter must round off the continuous levels of the analog signal to these numerical values (3). This results in a digital signal that models the original analog signal as a series of numbers (4).

Figure 1.18. If you zoom all the way in with a software application that supports sample-accurate display, you can see the individual samples in the waveform. The lines connecting each sample (each square shown) are added to make the display easier for human eyes to understand; the data itself is just the series of discrete values. (Applications like Apple Soundtrack Pro let you edit individual samples as well as view them.)

The A/D converter converts the incoming signal into digital form, after which it can be processed or stored in the computer. But we also need a way to get that digital data back into the form of an analog voltage. Otherwise, we won't be able to drive headphones and speakers to convert it back into real-world sound. A digital-to-analog (D/A) converter (sometimes called a DAC) does just the opposite of what the A/D does. The D/A converts digital data back into continuous analog voltage.

Why you need both digital and analog: You can't convert directly between sound and digital information. Transducers like microphones, headphones, and speakers ultimately need to convert sound waves to and from continuous analog voltages. Likewise, many audio processes can only operate on digital signals, and computers can only directly process digital sound.

Devices can be considered digital devices if they contain A/D and/or D/A capability or if they work with digital audio directly. These devices include:

Computer audio interfaces and stand-alone converters
Digital effects, such as digital reverb units, multi-effects processors, guitar amp simulators, and many others
Digital mixers and playback and recording devices, including portable digital recorders like MiniDisc and DAT
Digital instruments, including nearly all MIDI-based synthesizers, new guitars with onboard digital capabilities, and other devices that are designed to interface with a computer
Digital connections, including the digital-audio connection that runs from a consumer DVD player to a surround sound receiver
Computers

Quality in the Digital Domain

Uneducated consumers often think the word "digital" means something is high-quality . In fact, to say something is "digital" says no more about the quality of the sound it will produce than to say that it is "electrical," as you've no doubt already discovered from experiencing poor-quality digital cable and mobile phone reception . Digital media solves some problems. For example, no noise will be introduced after recording, as in analog recorders. But digital recording introduces new problems. Most significant of these is the issue of data loss: since analog-to-digital conversion samples the signal as discrete levels, it removes some of the information present in the original signal. Whether or not that lost data is noticeable is a function of how detailed the sample is.

Two elements work together to define the sampling process: how often the converter takes a sample (the sampling rate ) and how accurately it can represent that sample as a number (the bit depth ). It's important in any digital capture device, including any audio device, to record enough information to provide an accurate record of the source signal. The sample rate you choose determines how much frequency range you can record, and the bit depth determines how accurately you can record changes in the level of the analog signal being sampled, which impacts dynamic range and thus the amount of residual noise in the signal.

Sampling rate and frequency range

The sampling rate is how often the A/D converter measures the signal level; the samples are roughly analogous to a series of snapshots. If the converter takes ten samples of the signal each second, it has a sampling rate of 10 Hz.

The frequency range of an A/D converter is determined by sample rate, but probably not in the way you'd assume. The highest frequency you can capture is only half the sample rate. A sample rate of 10 Hz can capture a maximum frequency of 5 Hz, not 10 Hz. The reason is that, without twice as many samples of a source as there are up and down oscillations, you lose some of the oscillations.

Aliasing occurs when the highest frequency being sampled is higher than the highest frequency that can accurately be captured by the A/D converter. Aliasing adds unwanted distortion to the audio signal by artificially lowering the frequency of high partials . Aliasing can occur in a digital audio system as a result of a poorly designed A/D converter, but you're far more likely to hear it when synthesizing high notes using a software-based digital synthesizer. If the synthesizer doesn't use anti-aliasing technology, high notes are likely to turn into random clusters of tone that have no relation to the key you're playing.

To avoid aliasing, you need the sampling rate to be at least twice as great as the highest frequency you want to capture. Why twice as high? When a sound has a high frequency, the peaks and troughs in its waveform will be close together. If the sampling rate isn't fast enough to capture every peak and every trough, the digitized version of the sound will have a different waveform than the original ( Figure 1.19 ). If the frequency of what you're sampling or synthesizing is greater than half the sample rate, the original frequency will be lost. When you listen to the tone that was recorded or synthesized , you'll hear a different frequency, as shown in Figure 1.19. This new frequency, introduced by the sampling process, is often called a foldover frequency or an alias, because the higher frequency, beyond the range of the converter, is folded over into a lower frequency that lies within the range of the converter.

Figure 1.19. Aliasing occurs when an analog-to-digital converter is unable to capture enough data to represent a high frequency. If the digital samples are not taken often enough to capture each up and down oscillation of the source, the digital representation of the wave will be inaccurate: it will contain a lower frequency. The sampling process " folds " the higher frequency into a lower one, so it's called a foldover frequency, or an alias of the higher frequency.

Researchers at Bell Labs were familiar with this problem as early as the 1920s and recorded the principle as the Nyquist-Shannon Sampling Theorem. The theorem is simple: to properly sample frequency value x , you need a sample rate of at least twice x . (The maximum frequency that can be sampled without aliasing at a certain sample rate is thus called the Nyquist frequency.) So why do we need the sampling rate to be twice as fast as the highest frequency to be recorded? Because each period of a regular waveform includes both an up and a down oscillation. If the A/D converter takes fewer than two samples per period, it can't capture the full oscillation. In order to capture each "up" and each "down" state, you need to take at least two samples each period. Thus the sampling rate has to be twice the highest frequency to be recorded ( Figure 1.20 ).

Figure 1.20. To capture the full oscillation of a frequency, you need at least two digital samples for each period of the sampled waveform. With two samples, the A/D will be able to capture both the up and down state of the waveform.

According to the Nyquist-Shannon theorem, to sample frequencies up to the upper range of human hearing (about 22,000 Hz), you'll need a sample rate of about 44,000 Hz, which is, not coincidentally, very close to the standard sample rate for commercial audio CDs, 44,100 Hz.

That obviously enables you to sample frequencies at the top of the range of your hearing, but what happens when the frequencies in the signal reaching the A/D converter exceed the maximum frequency limit of 22 kHz? Left untouched, they would fold over into the audible spectrum as distortion, so A/D converters incorporate an anti-aliasing filter that removes these high partials before the audio is converted to digital form.

Bit depth and signal-to-noise

The sampling rate tells us how an A/D converter works in time, and thus how it captures frequency informationthe x-axis of the waveform diagrams. The bit depth determines the amount of detail that can be recorded about the incoming level of the signalthe y-axis of the diagrams.

With each sample, the A/D converter must measure the incoming signal level and assign it one of a discrete set of numbers. For instance, if the converter can record a whole number between one and eight (that is, rounding off each value it records to 1, 2, 3, 4, 5, 6, 7, or 8), then its bit depth is 3 bits. (A binary bit has two values, either off or on, so it's three bits because 2 ³ = 8: you need three on or off positions in order to count from one to eight). The converter is limited to these discrete values. It can't record that the signal is between two and three; it must round off to one or the other. Needless to say, 3 bits are not nearly enough; 8-bit and 12-bit converters were used in many early digital audio devices, and today 16-bit and 24-bit converters are the most common. With each bit added, the number of possible sound pressure levels that can be stored doubles16-bit audio has over 65,000 possible levels of resolution; 24-bit has over 16 million. You've probably experienced what happens when you reduce bit depth by talking on a mobile phone: the sound becomes noisier, harsher, and less clear.

The direct impact of the bit depth on the captured signal is on dynamic range: the greater the bit depth, the greater the range of dynamics or amplitude levels you can capture before (at the lowest amplitude level) the signal is submerged in the background noise. Dynamic range is obviously important, given the level of dynamic range our ears can perceive. But its real significance is that, when the number of possible dynamic levels is limited by the use of a converter with a lesser bit depth, the measurement of the analog signal becomes inaccurate. This inaccuracy is perceived by our ears as noise. We hear the errors created by the rounding off of the numbers, called quantization errors , as noise.

If we increase the dynamic range of the digital audio system by using a greater bit depth, we effectively reduce the amount of background noise in the system: the difference between the loudest signal that the system can handle and the residual noise is greater. This ratio of distinguishable signal information to background noise is called the signal-to-noise ratio . The greater the bit depth, the greater the dynamic range, and the higher the signal-to-noise ratio (sometimes abbreviated "s/n") of the system.

Commonly used sampling frequencies and bit depths

Digital audio resolution is measured in terms of sampling frequency (related to sound frequency range and measured in kHz) and bit depth ( related to amplitude and measured in bits). These values are roughly equivalent to image resolution and color depth in digital graphics. Any numbers are theoretically possible for these values, and you can mix and match sampling frequency and bit depth, but the settings you'll encounter most often are

16-bit, 44.1 kHz: The standard for Red Book CD Audiothe commercial audio CD format. Also used for consumer CD-Rs and the most common default for computer audio software.
16-bit, 48 kHz: The standard for digital video (DV), commercial DVD videos , and most digital-broadcast video.
24-bit, 96 kHz: The emerging higher-resolution format increasingly supported by computer audio software and hardware, although not yet a widely adopted standard in the consumer marketplace for listening to music.

A good minimum standard for many types of recording is 16-bit/44.1 kHz, because it's the output quality of commercial audio CDs, and its sampling frequency can record up to the top range of human hearing. ( Generally , you'd use 48 kHz sound to match the output sample rate for standard video, not to get a 2 kHz increase in Nyquist frequencysomething you're highly unlikely to notice as an improvement in audio quality.)

It may seem counterintuitive that you'd ever want to work with audio capable of manipulating frequencies above the highest frequency you can hear. There are three reasons why you might want sampling rates up to 96 kHz or greater, however. The first reason, although it's hotly debated, is that unheard frequencies above 22 kHz may have an impact on sound in the audible spectrum, making audio output at 96 kHz sound better or more accurate than 44.1 kHz. That's generally a matter of opinion: some claim they can hear this, others can't. The second reason is more concrete: some digital audio algorithms, particularly those associated with number- intensive processes like time stretching and pitch shifting, can achieve better results when they start out with more data.

The third reason is equally important: although it's debatable whether high frequencies directly influence the audible spectrum in a significant way, phase distortion introduced by the anti-aliasing filter is much less likely to occur in an audible frequency range when the sampling rate is higher. The absence of this distortion can result in a subtle but noticeable change in perceived clarity. This doesn't mean you should immediately start recording everything at 96 kHz, especially since that will be costly in hard disk space and processing power. But it does mean there is some difference between the sampling frequencies, and that there's a reason why professional studios pay good money for equipment that can operate at higher sampling frequencies.

Regardless, you know you can get decent results working with a minimum of 16-bit, 44.1 kHz audio. If you do, factors like mic choice and placement, signal level, and other recording quality issues are far more likely to impact the audio quality of your recordings than using a higher sample frequency or greater bit depth.

Digital File Formats

Chapter 10 covers how best to export audio for sharing and distribution. But since you're likely to be manipulating audio file formats long before then, here's a basic overview of the file formats for digital audio you'll encounter most often.

The most critical distinction in audio formats is between compressed audio and uncompressed audio. Compressed audio reduces the amount of data stored in order to save space on hard drives and other recording media, and to speed transmission times over the Internet. To do this, it removes information that is less critical to your ear for hearing the source material. If the compression algorithm has no impact whatever on the soundif the original sound can be reconstructed perfectly from the compressed filethe compression is said to be lossless .

Untangling Confusing Terms

Bit depth is sometimes also known as word rate, or bit resolution; all the terms mean the same thing.

Resolution is sometimes used in place of bit depth or sampling rate, but confusingly so. Technically these terms would be either bit resolution or sampling resolution, since resolution is a generic term for accuracy.

Bit rate is the combination of the sampling rate and bit depth; it represents the total amount of data per second (bits per second, bit/s, or bps). For instance, CD-quality audio, with 16-bit bit depth and 44.1 kHz sampling rate, is measured as 1,411.2 kbit/s. Bit rate should not be confused with bit depth; it's usually referred to in regards to audio and video compression.

The Analog vs. Digital Debate

Ever since digital technology was introduced to the masses with the advent of consumer CD players, audio lovers have debated the relative merits of digital and analog technology.

Certainly, analog technology has its advantages. Because of the unique characteristics of certain analog components , sought-after analog equipment can color the sound in desirable ways, the quality many describe as "warmth." This is especially true in analog synthesizers, which produce sound entirely from the voltage generated by analog circuits. Although the basic sound of these instruments can be replicated using digital technology, much of their character and personality comes from their analog design. Like an acoustic instrument, no two are the same. In the area of recording, well- maintained , high-end analog equipment can achieve a theoretical frequency range that's greater than that of the standard commercial audio CD format.

Using analog signals exclusively has disadvantages, however. Analog signals are more susceptible to noise and interference than digital, and analog copies are susceptible to generational quality loss, which is not usually a problem with digital audio. (Generational loss is what happens when you make a copy of a copy of a copy, and the quality deteriorates as noise is added.) Digital is not completely immune to storage and transmission issues of its own, but if you make regular backups of computer audio files, for instance, you can maintain a pristine copy of your work with no loss of fidelity whatsoever.

Digital also has some unquestionable advantages: countless audio processes and techniques are possible only using digital technology, and many other techniques are easier or more flexible when digital audio is employed. That's not to disregard the feel and sound of analog techniques. There's no reason why you can't bring " vintage " analog gear into your studio and use it alongside your computer rig.

Most people use some combination of digital and analog equipment, choosing the right tool for the right job. And ultimately, all digital audio has to be transformed back into analog signals to be reproduced as audible sound.

Unfortunately, most audio compression formats are lossy , meaning that when they remove data to save space, they also reduce the quality of the recording: information critical to the sound is lost. This may be in the form of sound in certain frequency ranges that is weakened or cut completely, or in the form of artifacts noise and other unwanted sounds that are added to the sound in the process of compression. Once you've lost sound information due to lossy compression, there's no way to restore that data, so you'll nearly always want to maintain an uncompressed version of any important recording. Also, many audio applications are not compatible with compressed formats.

Formats like commercial audio CDs and WAV and AIFF file formats are generally lossless, uncompressed audio formats. Formats like MP3 and MP4, Real Audio, Windows Media, and Apple AAC files purchased from online music stores like MSN Music, Napster, and the iTunes Music Store are all lossy.

Tip: When in doubt, save a file as WAV on the PC and AIFF on the Mac. It's the safest bet for compatibility with the vast majority of programs. If given a choice, choose "uncompressed WAV" and "interleaved AIFF."

The other element to consider when choosing an audio file format is whether the file is mono or stereo, and how the stereo data is stored. For stereo files, some programs offer a choice of interleaved files, which store left and right audio tracks in a single file, or split files, which divide the file into one file for the left channel and one for the right. MOTU's Digital Performer (www.motu.com) on the Mac, for example, requires separate left and right files, but most programs prefer the more common interleaved files. Surround files introduce still more issues, and file storage is evolving. For more on surround mixing and sharing, see Chapter 10.

MIDI

Although recorded digital audio can be manipulated in many ways, the data contained in the digital audio depicts only the sound(s) of the instrument(s) being played. The recording contains no information about which notes were played or what physical gestures the musicians made to perform them. In the centuries before digital technology entered the picture, it was useful to record musical sound and to transcribe musical performances as written scores. In a similar way, digital technology gives us a way of recording and transmitting musicians ' performance gestures in digital form.

MIDI (the Musical Instrument Digital Interface ) is a digital protocol with which we can describe musical events and physical gestures, and record and transmit them in a standardized format between devices and computers. Chapter 8 is dedicated exclusively to MIDI, but here's a quick overview.

MIDI data acts as a control language, enabling hardware and software to send and receive musical performance information in real time. The MIDI specification involves three separate elements:

A file format: If you've worked with "MIDI files," you've been using the common file format specified for storing MIDI information. You don't have to have a MIDI file to use MIDIyou can also use it as a live control protocol or even store MIDI data in a nonstandard file formatbut the standard file format is a convenient way to store and exchange MIDI data.
A protocol specification: MIDI is a standard way of describing music in a digital form that can be understood by hardware and software. Musicians already have a language for music: one musician can say "middle C" or "B minor" or " eighth -note," and other musicians will know what that means. Likewise, a standardized protocol allows different devices to speak the same language when talking about musical events.
A standardized interface: MIDI requires a physical interface and cabling between hardware units for controlling other devices in real time. This interface is often a USB or FireWire connection, although special-purpose MIDI connectors are still common, as shown in Figure 1.21 .

Figure 1.21. A standard 5-pin MIDI cable can carry data about notes and other events between a variety of equipment. Your MIDI device may use USB or FireWire to carry the same data. (Photo courtesy Hosa Technology, Inc., www.hosatech.com)

MIDI and digital audio are often used side by side in computer software, and both can be used to produce music that we can listen to, so newcomers aren't always clear on the profound differences between them. It's important to understand that they're entirely different technologies. MIDI data is much more compact and far easier to edit, but MIDI only produces sound when it's sent to a synthesizer (either hardware or software) or some other type of electronic instrument. The instrument produces the actual sound: MIDI only gives the instrument instructions about what notes to play and so on. MIDI is roughly analogous to a printed score: it makes no sound until it's played by a musician. You need a MIDI instrument to render the "score" of MIDI as sound, just as you need a musician to render a written score as audible music.

Some of the basic, common kinds of data MIDI transmits include the following:

Note messages tell the receiving instrument which notes to play ( Figure 1.22 ). Note messages also contain information about key velocity, which indicates how hard the musician struck the key on the keyboard.

Figure 1.22. At the lowest level, MIDI is simply a list of performance control events. This event list view in MOTU Digital Performer is accompanied by a legend. The numbers are, from left to right, bar/beat location, pitch, how hard the note was played, how lightly it was released (not used here, hence all those 64's), and duration. For more information on MIDI and how to edit it, see Chapter 8.
Controller data represents physical movements, like turning a knob, sliding a fader, moving a wheel, or depressing a pedal.
System messages include events like start and stop messages for synchronizing equipment, and device-specific informationfor example, information that tells a particular Yamaha model how to configure itself.

MIDI can be used for controlling software and instruments expressively in real time. MIDI can also be used for controlling lighting and video, and for controlling the playback of many types of audio software (not just synthesizers). It's even used in unusual applications like collecting data from sensors and controlling robotic mechanisms.

Putting It Together

Now that you have a basic grounding in how sound and digital audio work, you're ready to tackle more advanced concepts. You'll find a knowledge of acoustics particularly helpful when dealing with recording in Chapter 6, effects processing in Chapter 7, synthesis in Chapter 9, and mixing in Chapter 10. In the meantime, now that you have a basic handle on the terminology, the next step is to get your digital studio working, the subject of the next three chapters.

Theory overload? If all of this textbook -style material has left you itching for some hands-on experience, take an excursion to Chapter 5 for quick tutorials in loop-based song production, or try some of the hands-on exercises later in the book.