Video Playback in DirectShow

Video Playback in DirectShow

We start with an overview of the DirectShow archicture. We ll introduce some terminology that we ll be using extensively throughout the next few chapters, then we ll walk through some basic code to play a video file. After that, we ll explore some of the sophisticated video-mixing features that are provided in DirectShow out of the box. With minimal coding, you can mix and alpha-blend multiple videos , creating fades, zooms, picture-in-picture effects, fly-in effects, and so forth. These effects look impressive enough on a 2-D window ” embedded in a 3-D environment, they look even better. But that s a subject for Chapter 10.

The Five-Minute Introduction to DirectShow

If you re already familiar with DirectShow programming, you can skip this section. Otherwise, please stick with it! Some of what follows may seem a bit dry and abstract, but it s important for understanding everything that comes later.

DirectShow was designed to process streams of data ” typically these are audio and video streams, but the architecture is sufficiently general that a stream can hold any other kind of data, such as text, MIDI, or network packets. In any case, to accomplish a given task, DirectShow divides the task into smaller subtasks . For example, a typical task is play an AVI file.

This task can be broken down into the following subtasks:

  1. Pull data from the AVI file as a stream of bytes.

  2. Parse the byte stream to extract the audio samples and video frames .

  3. If the audio samples are compressed, send them to an audio decoder. The decoder outputs uncompressed audio.

  4. If the video frames are compressed, send them to a video decoder, which outputs uncompressed video frames.

  5. Send the uncompressed audio to the sound card.

  6. Draw the uncompressed video frames on the screen. This step must be carefully timed to maintain the correct frame rate, and to make sure the video stays in sync with the audio.

For each of these subtasks, DirectShow provides a software component called a filter . Filters are COM objects that expose a defined set of interfaces.

To continue our example, the following filters are used in AVI file playback:

  • Async File Source filter. This filter reads data from the file as a stream of bytes, without parsing the data in any way.

  • AVI Splitter filter. This filter reads the AVI file headers and index, and parses the AVI file structure. It pulls data from the Async File Source filter.

  • Audio decoder filter. Numerous audio decoders exist, and the specific filter that must be used depends on the type of audio compression in the file. The audio decoder receives compressed audio samples from the AVI Splitter and outputs uncompressed PCM audio. If the audio in the file is not compressed to begin with, the audio decoder is not needed.

  • Video decoder filter. Again, there are many video decoders, and the specific decoder filter depends on the type of compression used in the file. The video decoder receives compressed video frames from the AVI Splitter and outputs uncompressed video frames.

  • DirectSound Renderer filter. This filter sends the uncompressed audio samples to the sound card, using DirectSound.

  • Video renderer filter. This filter draws uncompressed video frames onto the screen, using DirectX Graphics. In case DirectX is not available, GDI is available as a fallback. For reasons that will be explained shortly, DirectShow provides several distinct video renderer filters, each with its own feature set.

Figure 9.1 shows how these six DirectShow filters would be assembled by an application to play an AVI file. The entire configuration shown in this diagram is called a filter graph, a term that really just means a collection of filters that work together. (When the context is clear, we ll often shorten the phrase filter graph to simply call it a graph. )

click to expand
Figure 9.1: DirectShow filter graph for AVI file playback.

As you can see from the diagram, the filters in the graph are connected to each other, and the arrows in the diagram indicate the direction of data flow. For example, by tracing the arrows, you can follow the life cycle of a video frame from unparsed bytes in the file to a rendered image on the screen. Audio samples follow a different route to the sound card. The AVI Splitter has the job of sending the video and audio data down the correct paths.

The points where the filters connect are called pins . Pins, which are also COM objects, provide the mechanism for filters to move data through the graph. Each pin has a defined direction, either input or output. The audio and video data always travels from output pin to input pin. At any given time, several buffers of data might be traveling across different pin connections. While the video renderer is drawing a frame on the screen, the decoder may be decoding a new frame, and the AVI splitter may be parsing the file. This helps to ensure that there are no gaps during playback.

One nice feature of the DirectShow architecture is that filters are implemented as separate modules, so they can be used independently of one another. You ve already seen that various decoder filters might be used in the AVI playback graph, depending on the compression format. Similarly, to play some other file type, such as ASF or MP3, you can simply substitute a different parser filter in place of the AVI Splitter. Also, because filters use a set of well-defined COM interfaces to communicate with each other, you can write your own custom filters that do specialized processing tasks .

Media Types and Data Flow

We ve described how data moves through the filter graph across pin connections. Because filters modify the data that moves through them, the format of the data can change from one filter to the next . We alluded to this earlier when we described how the AVI Splitter takes an unparsed byte stream, and outputs video frames and audio samples. Filters therefore need a way to describe the format of the data at each pin connection. This is done by using a structure called a media type . The media type can be used to describe audio formats, video formats, or any other kind of data format.

Filters have a built-in logic that enables them to negotiate the media type for each pin connection. For example, a video decoder has a list of all the uncompressed video formats that it is capable of outputting. Similarly, a video renderer has a list of all the video formats that it can render to the screen. In both filters, the exact list of supported formats may change depending on various factors ” such as the incoming data on the decoder or the user s current display setting. When the decoder connects to the video renderer, the two filters are able to select a common format. This process happens automatically when the application connects the filters.

Filters also manage the flow of data through the graph. The application does not have to push each individual video frame or audio sample from one pin to the next. As you ll see, the application simply gives run and stop commands. Streaming then occurs on worker threads, which are created and managed by the filters. This means an application can block ” for example, while waiting for user input ” without interrupting playback.

The Filter Graph Manager

Another important object in DirectShow is the Filter Graph Manager. As its name suggests, its role is to mediate between the filters and the application. It assembles the filter graph (with help from the application), sends run and stop commands to the filters, returns event messages to the application, and performs other housekeeping tasks. It is possible to write a DirectShow application that communicates exclusively with the Filter Graph Manager and never deals directly with filters. More often, an application will interact with both layers , sometimes calling methods on the Filter Graph Manager, and sometimes calling methods directly on filters. Figure 9.2 shows these relationships.

click to expand
Figure 9.2: Filter Graph Manager.