Now that you understand bandwidth and bit rate and are familiar with how they figure into the streaming media phases, let’s talk about how to calculate them for both audio and video.
When capturing audio and converting it to a digital form, you take discrete measurements of the audio signal at different points in time. These measurements are called samples. The more samples you take, the higher your sampling rate. Higher sampling rates usually mean better audio quality because the digitized version will more closely match the original. To put this in context, the sampling rate for a compact disc is 44,100 samples per second. This is denoted as 44.1 kilohertz (KHz).
Before we start calculating bit rate, however, there are two additional concepts to consider when thinking about audio: bit depth and channels. Bit depth refers to the number of bits that are used to store data about a particular sample. A greater bit depth means that more data is available about each sample, and more data leads to higher quality. CDs have a bit depth of 16 bits.
Channels, the final concept in the audio equation, are the number of discreet signals in an audio file or stream. Stereo, for example, has two channels, while mono has one. This is important because the number of channels figures into the size of your audio file.
Now for the audio bit rate calculations:
Samples per second * bit depth per sample * number of channels = total bits per second (bps) Total bits per second / 1024 bits per kilobit = total Kbps
The bit rate of a CD, which in technical terms is uncompressed pulse code modulation (PCM) audio, would be as follows:
44,100 * 16 * 2 = 1,411,200 bps 1,411,200 / 1024 = 1,378 Kbps
According to our calculations, an uncompressed CD audio file would stream at 1,378 Kbps, while a typical 28.8 or 56 Kbps dial-up modem can stream up to approximately 32 Kbps. The dial-up modem would bog down before you hear a single note. Obviously, this kind of user experience is unacceptable.
To create the perception of motion, the brain automatically adds or fills in missing information. It does this first through a concept known as persistence of vision, where a visual stimulus continues to be registered by the brain for a very short time after the stimulus ends. Secondly, it takes advantage of what is known as the phi function. According to the phi function, when two adjacent lights alternately flash on and off, the brain interprets the flashing as a single light shifting back and forth. This is because we tend to fill in gaps between closely spaced objects of vision.
Motion pictures take advantage of these two phenomena to suggest the appearance of movement. Video content, for example, is a collection of static images rendered so quickly that the images appear to be in motion. Flip-page animations are also based on this concept. By making a series of drawings on separate pages and flipping them quickly with your thumb, you can make a picture seem to move.
In video, the images are called frames, and the speed at which they are displayed is measured in frames per second (fps). The higher the fps, the smoother the motion appears. Generally, the minimum fps needed to display smooth motion is about 30 fps. For high-motion content, you’ll need 60 fps.
| Note | The 30 fps quoted above represents the National Television System Committee (NTSC) standard. Phase Alternating Line (PAL) is the European standard and uses 25 fps. | 
|  | 
The number of lines in a frame and the number of frames broadcast per second are determined by three main television broadcast standards. The standards are in place to ensure that broadcast signals are compatible with the television sets made to receive them. Most countries or regions use one of three standards. Each standard, however, is incompatible with the others.
NTSC is used by many countries or regions in North and South America and Asia. It specifies 525 horizontal lines per frame, and uses a frame rate of 30 fps.
PAL is used in most European countries or regions (except France and Russia). It specifies 625 lines per frame, and uses a frame rate of 25 fps. PAL also uses a wider channel and wider video bandwidth than NTSC.
Sequential Couleur Avec Memoire (SECAM) was implemented in France in the early 1960s. It specifies 625 lines per frame, and uses a frame rate of 25 fps. Like PAL, SECAM also uses a wider channel and wider video bandwidth than NTSC.
Both the NTSC and PAL standards use interlaced video signals.
|  | 
Each frame of video is composed of between 525 and 625 horizontal lines. Broadcast television uses a variation called interlaced frames in which every frame is composed of two fields. Each field contains every other line of the television frame, or half the image. One field contains the odd lines, the other the even lines. When displaying video, an NTSC television draws one field every 1/60th of a second, and PAL televisions display one field every 1/50th of a second. When video is converted to a digital format by using a capture card, the two interlaced fields are combined into a complete frame and rendered at 30 fps or 60 fields per second. This process is called deinterlacing and is explained further in chapter 2.
As you can see, the frames per second calculation is of primary importance and contributes directly to the bit rate required to stream video. Of equal importance is image resolution. If you think of image resolution as the number of pixels that combine to create a picture, then you understand that the more pixels you use, the better your image quality will be. But high-resolution video also creates large files, takes up more bandwidth, and requires more system resources to encode and render.
To determine the bit rate of uncompressed video, use these formulas:
Video resolution * video frame rate = Total pixels per second Total pixels per second * [12 bits per pixel (based on the YUV pixel format, which is more efficient for streaming)] = total bps Total bps / 1024 = total Kbps Total Kbps / 1024 = total Mbps
Let’s say you want to figure the bit rate of the VHS movie that we discussed earlier. Its resolution is 640 x 480, and its frame rate is 30 frames per second.
640 * 480 * 30 = 9,216,000 pixels per second 9,216,000 pixels * 12 bits per pixel = 110,592,000 bps 110,592,000 bits / 1024 = 108,000 Kbps 108,000 Kbps / 1024 = 105 Mbps
You’re not going to have much luck streaming 105 Mbps over any Internet connection that’s available today. Add to that the audio bit rate that we calculated earlier so you can have a video signal with an accompanying sound track, and we’re talking about a lot of bits! Enter the encoder.
Encoders, you’ll recall, compress digitized audio and video and encode them into formats suitable for streaming. How much compression is applied depends on your desired audio and video quality and the bandwidth available to you. In broadband scenarios, you can go with higher resolutions because you have the bandwidth to accommodate them. But in dial-up scenarios, you must make tradeoffs in order to deliver lower-bit-rate content at an acceptable level of quality. Once you know how much bandwidth you have to work with, you can use your encoder to compress the content sufficiently to fit within that bandwidth.
|  | 
There are different methods for representing color and intensity information in a video image. The video format that a file uses to store this information is also known as the pixel format. When you convert a file to Windows Media Format, some pixel formats are recommended over others in order to maintain high content quality. The two major types of pixel formats are RGB and YUV.
Color video is comprised of three primary colors: red, green, and blue. When encoding using the RGB pixel format, 8 bits are allocated to each of the red, green, and blue values of every pixel for a total of 24 bits per pixel. The human eye is not capable of discerning all of the subtle variations in color, so bits are being expended when there really is no benefit to doing so.
The YUV pixel format is a color-encoding scheme that divides the spectrum of color into luminance (the black-and-white component of a video signal that controls the light intensity) and chrominance (the color component of the video signal). The human eye is less sensitive to color variations than to intensity variations, so YUV allows the encoding of luminance (Y) information at full bandwidth and chrominance (UV) information at half bandwidth. This means that 16 bits or fewer are allocated per pixel rather than the 24 bits allocated by the RGB pixel format, making YUV a more efficient pixel format for streaming.
Different YUV formats use different sampling techniques. These techniques vary in both the direction of the sampling and the frequency. They also differ in the ratio of luminance-to-chrominance sampling of a video signal. For example, when encoding using the YUY2 (4:2:2) sampling method, for every four pixels, each pixel is sampled for its luminance value. Two of the pixels are then sampled for their blue value, and the other two are sampled for their red value. Another scheme is 4:2:0, in which two bits are sampled for their blue value and none are sampled for the red value. This method uses less bandwidth and requires less storage space, but produces a slightly lower-quality video signal.
The following list contains the recommended pixel formats (in order of preference) for encoding:
IYUV/I420 (planar 4:2:0 or 4:1:1).
YV12 (planar 4:2:0 or 4:1:1).
YUY2 (packed 4:2:2). 
UYVY (packed 4:2:2)
YVYU (packed 4:2:2)
RGB 24
RGB 32
RGB 15/16
YVU9 (planar 16:1:1)
RGB 8
|  | 
