Examining the DV Format | Programming Microsoft DirectShow for Digital Video and Television (Pro-Developer)

Examining the DV Format

Before going any further into the details of DV camcorders and DirectShow, it s important to cover the basics of the digital video or DV format. DV is a specification for video and audio, just as JPEG and GIF are standards for images and WAV is a standard for audio. DV is markedly different from other video formats you might be familiar with, such as MPEG-2 (used on DVDs) or Windows Media, because the DV format was designed with an eye toward capture and editing of high-quality images. DV has been optimized for production, not distribution. DV files are huge a minute of DV runs to about 230 megabytes! Why is it so fat? Many video compression formats, such as MPEG-2, employ the technology of keyframes. A keyframe can be thought of as a snapshot of the video stream, which can be used by successive video frames to help construct their contents without specifying everything in the frame because those frames can refer back to the keyframe. This functionality saves lots of data storage, but it also means that editing the file is much harder because the entire image isn t encoded in every frame.

In DV format, there is no keyframing, or rather, every frame is its own keyframe. The entire image is available in every frame of DV file that s one reason they re such disk-hoggers. However, there is a big benefit to this richness of detail: you can pop a DV file into Adobe Premiere or Windows Movie Maker and work on it quite easily. To do the same with an MPEG-2 file, on the other hand, requires lots of computation on the computer s part because it reconstructs frames from keyframe data. This difference means that editing a DV file is faster than editing an equivalent MPEG-2 file.

Although the video fields (30 frames per second, 2 fields per frame, 60 fields per second) captured by the camcorder and output to the DV stream are complete, without any need of keyframe data to reconstruct fields, each field is internally compressed using a technique called discrete cosine transform (DCT). DCT is mathematically analogous to a similar compression technique used to make JPEG images small in size, yet highly detailed. This intrafield compression keeps DV files from being even bigger than they otherwise would be, but it comes at a price: a DV stream doesn t have all the detailed resolution that a fatter stream would, but it represents a good tradeoff between file size and image quality.

Another tradeoff between file size and image quality in DV format is apparent from the way that brightness and color information (luma and chroma in the parlance of DV) are stored. Rather than storing an RGB bitmap, which is what you might expect from a digital imaging device, DV streams express luma and chroma in YUV format. (A detailed explanation of YUV format can be found in Chapter 10.) The specifics of the YUV format used in consumer-level DV devices tend to favor red hues over blue ones, so DV devices tends to be more sensitive to reds than blues. Additionally, there are encoding differences between PAL-format DV (used in Europe and much of the rest of the world) and NTSC-format DV (used in the Americas and Japan). This encoding difference means that PAL-format DV loses some of its chroma information if converted to NTSC-format DV, and vice-versa.

Finally, the DV format should not be confused with the AVI file format frequently used as a container for DV data. Although Microsoft Windows operating systems prefer to store DV data in AVI files (other operating systems store them differently), there ultimately isn t any necessary connection between DV format and AVI. AVI format provides a convenient package for a DV-format video stream and a DV-format audio stream, but the specifics of the AVI file format are stripped out of the data when it s streamed to a DV device, such as a camcorder. (There s a wealth of information about the AVI file format in Chapter 14.) If you wanted to, you could write your own capture, transform, and render filters for a different DV-format compatible file type so that you wouldn t need the AVI format at all. That s entirely permissible but it s extremely unlikely that other packages, such as Premiere, would work with your newly created file type.

Using IEEE 1394, FireWire, iLink, and USB 2.0

Nearly all digital camcorders can be connected directly to a computer through a high-speed network interface. Here s where it gets a bit confusing because the same network interface is known by a multitude of names. In the world of Microsoft and its associates, the network interface is identified as IEEE 1394. This is the network specification approved by the Institute of Electrical and Electronics Engineers, a professional organization that has an internationally recognized standards body. For those familiar with Apple s Macintosh, the interface is known as FireWire a trademark owned by Apple and, until recently, available only to Apple s licensees. Finally, Sony, who codeveloped FireWire with Apple, calls the interface iLink. Whether it s called IEEE 1394, FireWire, or iLink, it s all exactly the same thing a high-speed network protocol designed specifically to facilitate communication between digital cameras and computers. (The protocol can be used with other devices, such as external disk drives, but its genesis was in the world of digital camcorders.) For the sake of clarity and because it s the term preferred in the Microsoft universe, it ll be identified hereafter as IEEE 1394.

One of the most useful features of IEEE 1394 is that it configures itself; you don t need to set up any addresses or switches or configure any software to use an IEEE 1394 device with your computer. (You do need to have the appropriate drivers installed on your computer, but those drivers are installed as part of the Windows operating system and should already be on the computer.) From the user s perspective, all that needs to be done is to connect an IEEE 1394 cable from the camcorder to the computer. When the camcorder powers up, the computer assigns it a network address and the computer and camcorder begin to communicate. That communication happens at very high speeds; the IEEE 1394 network runs at 400 megabits per second (Mbps), or roughly 4 times faster than fast Ethernet. That gives IEEE 1394 plenty of bandwidth to deal with the stream of video coming from a camcorder. In 2002, a new specification known as IEEE 1394b bumped the network speed up to at least 800 Mbps and up to 1600 Mbps with the right type of cabling. (The original IEEE 1394 specification was renamed IEEE 1394a.)

Like a computer connected to a network, an IEEE 1394 camcorder can read data from the network; this data can either be an audio-video stream or commands. Unlike computers, camcorders are custom-purpose objects with a few well-defined functions, such as play, record, pause, rewind, and fast-forward. All these commands can be transmitted to a camera through its IEEE 1394 interface, which gives the computer complete control over the camcorder without any intervention from the user. No buttons need to be pressed on the camcorder to get it to rewind a tape or record a segment of video. When connected, the camcorder becomes a computer peripheral.

Although IEEE 1394 reigns as the standard du jour for digital camcorders, a new protocol, Universal Serial Bus version 2 (USB 2), offers even greater speed (480 Mbps) and will be even more widely available on PCs than IEEE 1394, which isn t standard equipment on many PCs. Today, some camcorders have both USB 1 and IEEE 1394 network connections. The USB connection is most often used to transfer webcam video and still images to the host computer, while the IEEE 1394 connection is used to transfer video streams. USB 1 has a top speed of 10 Mbps, which is not really enough to handle a data stream from a digital camcorder. However, with the introduction of USB 2, we can expect to see a battle between the two standards for the title of unquestioned champion for connectivity to DV camcorders. From the point of view of the DirectShow programmer, it s unlikely that a switch from IEEE 1394 to USB 2 will require a change in a single line of code, but this situation is still unfolding, and a programmer with an eye to the future would be well advised to test any digital video software with both standards, just to be sure. As of this writing, the audio/video (A/V) specifications for USB 2 have not yet been finalized, so no one yet knows how USB 2 will play in the DirectShow environment.

Understanding DV Stream Types

DV describes a number of specific stream formats. The two you re most likely to encounter are known as SDL-DVCR and SD-DVCR. SDL-DVCR is the format used by consumer-level camcorders in their LP mode, and it delivers a stream bandwidth of 12.5 Mbps. Although the Windows driver for IEEE 1394 devices will acquire data from a device in SDL-DVCR mode, none of the supplied codecs will decode it. (There are third-party solutions that you can use if you find you need this capability.)

SD-DVCR is sent by camcorders in SP mode (or doing a live image capture). This format delivers a stream at 25 Mbps. That s the standard data rate for DV streams. Both SD-DVCR and SDL-DVCR are supported on miniDV and Digital8 tapes, and both stream formats are fully supported in DirectShow. The DVCAM tape format also records a 25-Mbps stream, but the timecode information recorded on DVCAM tape is more robust. (Timecode will be explained in the Timecode sidebar later in this chapter.)

For the video professional, HD-DVCR is the high definition version of the DV standard, and it carries a stream bandwidth of 50 Mbps. With this format, you get 1125 lines of screen resolution, whereas SDL-DVCR delivers 480 lines of resolution. HD-DVCR is generally reserved for video productions that will be shown theatrically, after a transfer to film (film has a much higher resolution than SDL-DVCR), or for programming that s being shot for high-definition television (HDTV) broadcast.

Although the HD-DVCR stream can be transmitted over an IEEE 1394 link, that doesn t imply that an HD-DVCR stream can simply be brought into and manipulated by a DirectShow filter graph. The standard capture source filter for DV devices will not work correctly with an HD-DVCR stream, nor will any of the other DV-specific filters. They re designed for SD-DVCR streams. A third-party DirectShow filter would be able to handle HD-DVCR decoding, but you don t get that filter as part of any Microsoft operating system.

Issuing Commands

One important consideration to keep in mind when working with electromechanical devices such as camcorders is that although they can respond to a command immediately, it takes some period of time before the command is actually executed by the device. For example, when a digital camcorder transitions from stop mode to record mode, it takes a number of seconds (generally, no more than three seconds, but the time varies from camcorder to camcorder) before the tape has threaded its way around the various heads and capstans inside the camcorder, and recording can t begin until that process has been completed. This delay will affect your DirectShow programs because issuing a command to a digital camcorder doesn t mean that the device immediately enters the requested state. Ten seconds is an eternity on a processor that executes a billion instructions per second.

For this reason, it s important to query the state of electromechanical devices after a command has been issued to them so that you can track the device as it enters the requested state or fails to do so. (For example, a miniDV tape could be record-protected, and therefore the camcorder would be unable to enter record mode.) Furthermore, it s possible that a device might say it has entered a particular mode before it actually has. Digital camcorders are notorious for reporting that they ve entered record mode before they can actually begin recording a data stream. This functionality isn t an issue for a human being hand-operating a camcorder because a person can tolerate a delay between pushing the record button and seeing REC show up on the camcorder s viewfinder. However, this situation could be disastrous for a DirectShow programmer, who might be writing out just a few seconds of video video that would be dropped by the camcorder as it entered record mode.

Although the specifications vary from model to model, in general, the more expensive the camcorder, the more quickly it will respond to electromechanical commands. Even so, it s a good idea to build some play into your DirectShow applications where electromechanical features are concerned. Listen to the device before you begin to use it, and design your programs with the knowledge that sometimes these devices tell fibs.

There are some things you can do to minimize these kinds of issues. Most camcorders have a pause mode, which will cue the tape around the play and record heads and leave it there. (You can t leave a device in pause mode forever, however, because these devices will automatically exit pause mode after a few minutes to prevent damage to those heads.) When playing video from a digital camcorder, a good practice is to put the device into play mode and then immediately issue the pause command. The camcorder will cue the tape but won t move it through the playback mechanism. When you issue the play command again, the playback will begin almost immediately. This same technique can be used for record mode, which generally takes a bit longer to enter than play mode. First enter record mode, then pause and wait a bit and then enter record mode again. Although it won t make for an immediate response from an inherently slow device, this technique speeds things up quite a bit and makes for better, more robust application design.

Processing Video and Audio Streams

Devices such as DV camcorders present a single stream through the IEEE 1394 interface. This stream is an interleaved mixture of video data and audio data. Although the specifics of this stream structure are complex, in general, the stream will consist of alternating fields of video information followed by a sample of audio data. Television is composed of fields of information, two of which make each frame of the picture. In the USA, television signals are composed of 30 frames per second (fps), transmitted as 60 fields, with each field holding half of the image. The fields are interlaced in alternating lines, as if they were two combs pressed into each other. For DV camcorders, each field is an image of 720480 pixels; this 3:2 aspect ratio is a little wider than the 4:3 ratio that gives television screens their characteristic rectangular shape. The digital camcorder image is much higher quality than the standard analog television image, which is generally given as 460350 pixels, but that s an approximation at best. Cheaper television sets overscan, so some of those pixels are lost on the top, bottom, right side, and left side of the screen.

Consumer DV camcorders generally record live audio in stereo at 32,000 samples per second with 12 bits of resolution per sample, which isn t quite the same fidelity as a compact disc but is more than enough for many applications. However, all camcorders can record audio streams of 48,000 samples per second with 16 bits of resolution per sample. That s better than CD quality. So it s possible, and even easy, to create digital video presentations that are far sharper and better sounding than any TV broadcast. Although they ll never look as good as 35-millimeter movies (which have a resolution of approximately 20002000 pixels), they ll easily surpass anything you ll see on a television unless you already have an HDTV!

Although the recording medium miniDV, DVCAM, or Digital8 contains both the video and audio tracks, DirectShow will demultiplex these streams into separate components, so DirectShow applications can process video while leaving the audio untouched, or vice versa. In any case, when dealing with DV camcorders, the filter graph will have to work with multiple streams, just as was necessary when working with webcams in Chapter 5. The difference with DV camcorders is that both streams will be produced by the same capture source filter, whereas a webcam generally requires separate capture source filters for the video and audio streams.

An entirely different filter graph must be built to handle recording of a DV stream to a digital camcorder. In that case, the filter graph will take a source (perhaps from an AVI file) and convert it into a multiplexed video and audio DV-format stream. Two DirectShow filters, the DV Video Encoder and DV Mux, will do this conversion for you. Once the combined DV-format stream has been created, it s sent along to the device to be recorded.

Building Filter Graphs

DirectShow includes a number of filters that are specifically designed for processing of DV streams. The most important of these are the Microsoft DV Camera and VCR (MSDV) capture source filter and the DV Splitter, DV Mux, DV Video Encoder, and DV Video Decoder transform filters. Most filter graphs built to handle DV streams will have at least one (and probably several) of these filters.

In the previous chapter, we noted that the enumerated list of video capture filters included a filter specific to my Logitech webcam. If I had a variety of webcams from a variety of manufacturers attached to my system, these too would show up as separate entries on the list. DirectShow works with DV devices a bit differently because Windows already has a full set of drivers used to communicate with IEEE 1394 devices. All this functionality has been gathered into a single DirectShow filter, the MSDV capture source filter. The filter has one input pin, which can receive a multiplexed A/V stream. It s unusual for a capture source filter to have an input pin, but in this case, it makes perfect sense: we will want to be able to write streams to this device that is, to write to the tape inside the camcorder. The filter also has two output pins; one of them presents a video-only stream to the filter graph, while the other presents a multiplexed A/V stream.

In most cases, you ll want to route the multiplexed A/V stream through your filter graph, which is where the DV Splitter transform filter becomes useful. The DV Splitter takes the multiplexed A/V stream of DV data on its input pin and splits the stream into separate video and audio streams, on two output pins. On the other hand, if you have separate DV video and audio streams and you need to multiplex them perhaps so that they can be written to a camcorder you ll want to use the DV Mux transform filter, which takes the video and audio streams on separate input pins and produces a multiplexed stream on its output pin.

If you have a video stream that you want to send to a camcorder, it needs to be in DV format before it can be accepted as an input by the DV Mux filter. To put the stream into DV format, pass the video stream through the DV Video Encoder. (In GraphEdit, it s not listed among the DirectShow filters; it s enumerated separately in the Video Compressors category.) The filter has two pins: the input pin receives a video stream, while the output pin issues a DV-encoded video stream. Intelligent Connect will not add the DV Video Encoder for you automatically. If you need it, you ll need to instantiate it in the filter graph and identify it within RenderStream calls as an intermediate filter.

Finally the inverse of the DV Video Encoder filter is the DV Video Decoder, which takes a DV-encoded video stream and converts it to another stream format. The exact output format from the DV Video Decoder is negotiated by the DV Video Decoder and the filter downstream from it. There are several possible output stream formats, and the DV Video Decoder will try to match its output to the needs of the downstream filter.