Sound Basics | Inside Flash MX (2nd Edition) (Inside (New Riders))

You live in an analog world. The data you receive from the world around you is presented as a continuous stream of information that your eyes, ears, and brain interpret. When sound or video is recorded for use on a computer, it has to be captured in discrete units or bits. By its very nature, digital information cannot be continuous. That presents some challenges.

Note

Think about the differences between an analog watch and a digital watch. An analog watch has hands that are in continuous motion and that can represent every possible increment of time. A digital watch, on the other hand, can display time only in very specific increments ”you go from one number to the next without ever seeing any of the values between the two numbers .

What you perceive as sound are vibrations traveling through the air in the form of sound waves. The sound waves reach your eardrum, which in turn begins to vibrate. Sound waves have both frequency and amplitude.

Frequency is how many waves of a sound pass a given point in one second; it is measured in hertz (Hz). The higher the frequency, the higher you perceive the pitch of that sound to be. Lower frequency sounds are heard as lower pitches. The distance from the peak of one of the waves to the peak of the next wave is known as the wavelength (see Figure 13.1).

Figure 13.1. One complete cycle for a sound wave has both an upper limit and a lower limit. The greater the height or amplitude of the wave, the louder the sound. The length between the peaks is known as the wavelength.

graphics/13fig01.gif

The height of the sound wave, its amplitude , determines its loudness. Loudness is expressed in terms of decibels. Human speech is typically at about 60 decibels. Music is usually played at 30 to 100 decibels. (Of course, if you're younger than 20 years old, 100 decibels is probably the lower range.)

Sound versus Vision

Think for a moment about the differences in the way you perceive motion and the way you perceive sound. Your eye and brain retain a visual impression for about 1/30 of a second ”depending on how bright the image is. How many times have you looked at a bright light, closed your eyes, and still were able to "see" the image? You could see it because the retina in your eye holds on to some of the information with which it was stimulated. This ability to retain an image is known as persistence of vision. Because of this, your eyes have a relatively slow response to change. In fact, your eye/brain combination can't really register pulses that come at you any faster than 50 or 60 times a second. You can take advantage of that when producing visual content.

Persistence of vision helps make the visual sample rate of 24 frames per second that is used in motion pictures work. In fact, you can use a lower visual sampling rate and still get satisfactory results. However, you can't do that with sound. The lowest frequency that most humans can hear is about 20Hz and the highest is about 20,000Hz. What all this means is that your ear is reacting to a stimulus that is changing between 20 and 20,000 times per second. Unlike your eyes, your ears have a rapid response to change.

Note

You actually can recognize sounds below the 20Hz range, but rather than hear them, you feel them as vibrations.

That's why sound adds so much weight to a file. To retain fidelity, you have to sample sound at a much higher rate than you would sample images. You can't fake out the human ear in the same way you can the eye. The sound has to be exactly right. The quality of a sound is controlled by both the sampling rate and the bit depth at which it is recorded.

Sampling Rates and Bit Depth in Sound

The quality of a sound is determined by a combination of the rate at which it is sampled and how many bits are used to store information about the sound. The sampling rate is how many times per second the sound was sampled during recording; it is measured in Hz. In Flash, the sampling rates you'll see range from 5kHz (5,000 samples per second) to 44.1kHz (44,100 samples per second). The bit depth is how many bits of information you are using to store your data. The higher the bit depth, the more detailed your sound will be. The lower the bit depth, the more noise you can expect in your file. Bit depths in Flash range from 4 bit to 16 bit.

Bit depth and the quality of a sound have a direct correlation, as shown in Table 13.1. The higher the bit depth, the more information you can save about a sound. The more information you can save, the higher the quality of the sound.

Table 13.1. The Correlation of Bit Depth and Sound

Bit Depth	Pieces of Information Stored per Sample	Quality
16	65,536	CD
12	4,096	Near CD
8	256	FM radio
4	16	Acceptable for music

If humans can hear sounds only between 20 and 20,000Hz, why do you see sampling rates that are so much higher than that? According to the Nyquist theory of signal processing, your audio sampling rate must be at least twice the highest frequency (think Hz) of the sound you are trying to capture to get the full dynamic range. Don't sweat the theory too much. In English, it just means that sound waves have an upper and lower limit (see Figure 13.1), and you need to sample each sound wave twice to capture both. Thus, to capture all the sounds that a human can hear, with the highest frequency being 20,000Hz, you need to sample at a frequency of at least 40,000Hz.

Table 13.2 lists the five standard sampling rates that are recognized by most audio cards. The higher the sampling rate, the higher the quality of the sound.

Table 13.2. Standard Sampling Rates and Related Quality

Sampling Rate	Quality
48kHz	Digital Audio Tape (DAT) quality
44.1kHz	CD quality
22.050kHz	FM radio quality
11.025kHz	High quality voice and music clips
5kHz	Simple, short sounds

Now that you've been introduced to bit depths and sampling rates, the next logical step is to look at how you can manipulate those two aspects of sounds to keep your Flash file sizes as small as possible.

Keeping File Size Small: Voice versus Music

Your speaking voice has far less variation in pitch extremes than a piece of music. In general, you can assume an upper frequency range for speech of about 4,000 “5,000Hz. So, what kind of sampling rate would you need to capture the full tonal range of the human voice? Referring again to Nyquist's theory, a rate greater than 8,000 “10,000Hz accurately reproduces the sound you are recording. Thus, you should get good voice quality at a standard 11kHz sampling rate.

In addition, any time you speak, you naturally have pauses between words and gaps between sentences. These pauses and gaps are where any noise or hissing would show up in your audio file. To reduce noise, you need to record sound at a relatively high bit depth.

Music, on the other hand, has far greater range in pitch than the spoken voice. Because the range of frequency is much greater than the spoken voice, music needs a higher sampling rate. Music, however, doesn't usually have gaps between the sounds, and it plays along smoothly. Because you don't have to be worried as much about noise in a recording of music ”some of the noise is masked naturally ”you can record music at a lower bit depth.

To recap, for human speech, you need a low to moderate sampling rate and a relatively high bit depth. For music, you need a high sampling rate and a relatively low bit depth. These tradeoffs are important because you can use them to help keep your file size down.

For both music and voice, you can cut file size in half by cutting the sampling rate or the bit depth in half. Another way to cut the size of your sound file in half is to use mono sound instead of stereo.

In fact, you can calculate the size of a sound file with a simple formula that takes into account all the issues discussed so far in this chapter:

file size = number of seconds x number of channels x sample

rate x number of bits · 8

Thus, if you have a 30-second sound clip that was recorded in stereo at a sampling rate of 44kHz and a bit depth of 16, your file size is as follows :

5280KB = 30 seconds x 2 channels x 44kHz x 16 bits · 8

Wow. You can see that very high quality sound files get really big.

If you convert the sound from two channels to one, the file size drops to 2640KB. Go even further ”if it's a voice recording, you're probably safe dropping the sampling rate to 11kHz. That saves you even more in file size. Now you're down to 660KB. That's a huge file size savings and a relatively little loss in quality.

Now take a look at the types of sound you can bring into Flash and how you add those sounds to your movie.