Codecs and Compression | Apple Pro Training Series. Optimizing Your Final Cut Pro System. A Technical Guide to Real-World Post-Production

Uncompressed standard-definition video's data rate is 270 Megabits/second (about 34 Megabytes/second); 1080i60 HDTV runs at 1.2 Gigabits/second (150 Megabytes/second). That's a lot of bits, so many video formatsboth on tape and inside your Macuse compression to get the data rate down to something more manageable. A codec is the bit of software and/or hardware that manages this task; "codec" is short for COmpressor/DECompressor.

Of course, there are tape formats and capture cards to record and play back uncompressed video, too; we still refer to part of the system that records and retrieves the bits from disk as a codec, hiding our terminological imprecision with the dodgy excuse that in these cases, "codec" stands for COder/DECoder.

Even analog video, when recorded on a computer, requires a codec. First, the analog video is converted to digital using a bit of hardware called (not surprisingly) an analog-to-digital converter: ADC or A/D for short. Once in the digital domain, the codec handles putting the bits on disk (with or without further compression) and retrieving them for processing or output. On playback, digital is transformed into analog with a digital-to-analog converter: DAC or D/A.

Codecs and compression let us trade off quality against data rate. Roughly speaking, image quality increases with increasing data rate, but only roughly speaking. For example, the DV codec's data rate is 25 Megabits/secondabout one-tenth the rate of uncompressed SDTVyet its images are perfectly acceptable for most purposes. An MPEG-2 version of the same picture, played off DVD or over the air as digital television, might run one-quarter the data rate of DV, yet look as good. Although the MPEG-2 version might be four times more efficient in terms of compression, it's also much harder to produce, requiring far more processing time and often two passes through the codec.

Codecs achieve compression using three different methods:

Reducing the Data to Compress

By throwing away some of the image prior to compression, you reduce the overall data rate. For example, NTSC DV records only 480 out of the 486 lines the 601 spec calls for (on playback, the top four and bottom two lines are filled in with black), and only half of the 4:2:2 color samples are recorded, using 4:1:1 in NTSC and 4:2:0 in PAL. FCP's OfflineRT setting uses a Photo JPEG codec, throwing away over three-quarters of the picture, using only 320 samples horizontally and 240 vertically, compared to the 720 x 480 DVNTSC picture.

You can also reduce the number of bits per pixel recorded. Most codecs record 8 bits per pixel (actually, 8 bits per luma or color sample), giving up to 256 gradations in brightness or color, although the 601-compatible codecs use a more constrained 219-step scale from 16 to 235 in luma. (The values outside that range accommodate "superwhite" and "superblack" excursions of the video signal.) Some allow 10, 12, or 16 bits per pixel for a smoother gradation, desirable when doing extreme color correction or high-caliber film work.

Note

Most digital videotape formats, including all digital camcorders (except Digital Betacam camcorders), record 8 bits per pixel, even if the camcorder is capable of outputting 10-bit signals on its SDI connector.

Spatial Compression

Most codecs that advertise compression get much of their efficiency from reducing spatial redundancy: most of the time, one pixel is not very different from its immediate neighbors.

Lossless codecs perform the compression with no loss of information: the decompressed picture is identical to the source picture. These codecs usually use a form of run-length coding, detecting runs of pixels with the same value and replacing them with a pixel value and a repeat count. Lossless compression is common on computersit's used in the zip and sit file formatsbut it's rarely used for video because it usually doesn't compress more than 2:1, yet it's computationally intensive. You will see it occasionally in workflows requiring lossless performance yet wanting some compression for data transmission and storage reasons, for example Apple's "Animation" andTheoryLLC's "Microcosm" codec.

Lossy codecs throw away information; the decompressed picture is not identical to the original, but ideally it's visually very close. Lossy codecs offer much higher compression ratios than lossless ones, anywhere from 4:1 to over 100:1 (albeit with more heavily degraded image). Such codecs often perform a basis transformation, a mathematical conversion from the spatial domain to a different domain in which the spatial redundancy can be more easily detected and removed.

Note

You can think of basis transformations as doing to pixels what the RGB to YUV transform does to colors: they are just different ways of representing data to allow more convenient processing. Indeed, the RGB/YUV conversion is itself a basis transform.

The most common such conversion is the Discrete Cosine Transform (DCT) used in JPEG, MPEG, and DV formats, which converts pixels into frequency components. There are also codecs based on wavelets, including the codec used in the Immix VideoCube NLE of the early 1990s and the Cineform HD codec on the PC platform.

DCT-based codecs get their compression by throwing away frequency components in each DCT blocktypically an 8 x 8 pixel areathat are close to zero. As more frequency components are discarded, more DCT artifacts appear in the decompressed image, the most common being "mosquito noise," spurious light and dark pixels "buzzing" around sharply-defined edges and transitions, such as sharp-edged text.

Wavelet-based codecs tend to result in softer images as compression is increased, perhaps with some "ripples" of faintly repeated detail dispersed across the picture. Although some people feel that wavelets give better images than DCTs do, wavelets are much harder to compute, and in real-time video work, processor time is precious.

Temporal Compression

So far, the techniques we've discussed all apply to single frames: each compressed frame stands on its own, with no reference to previous or following frames. This intraframe compression is ideal for editing since any frame can be easily retrieved.

Temporal compression exploits redundancies across time, since most frames are fairly similar to the ones just before and after. MPEG codecs take advantage of this by compressing groups of pictures (GOPs) together. Each GOP has one I-frame (intraframe-compressed) which stands on its own, and several P-frames (predicted) and B-frames (bidirectional) that encode only the differences between themselves and their neighbors.

More Info

"Understanding GOPs and Frame Types", starting on page 140 of Compressor 2.x Help, or page 87 of Compressor 1.x Help (In Compressor, choose Help > Compressor Help), has an excellent discussion of GOP structure and the different frame types.

The B- and P-frames are only a fraction of the size of the I-frames, so long-GOP compression winds up being about three to five times more efficient than I-frame-only, or intraframe-only, compression for the same level of picture quality. MPEG-1, -2, and -4 all use long-GOP compression for DVD, digital television transmission, HDV, and Web video. (MPEG -2 and -4 also have intraframe-only modes.)

The drawback is that accessing any single frame may require decompressing some or all of an entire GOP. Although long-GOP compression is great for getting the most efficiency out of a bitstream, it's computationally expensive and makes random access and off-speed play modes very difficult. FCP 5 can use long-GOP media in the form of HDV clips, but working smoothly and efficiently with HDV requires a fast, powerful G5 for best results. (FCP also lets you capture HDV using the I-frame-only Apple Intermediate Codec, which requires less CPU power to decode on the fly, but takes up more space on disk.)