Examining Data Flow Across the Filter Graph | Programming Microsoft DirectShow for Digital Video and Television (Pro-Developer)

Examining Data Flow Across the Filter Graph

Although many DirectShow programmers won t need to know more about how streams flow across a filter graph than has already been explored, programmers who intend to implement their own DirectShow filters will need a more comprehensive explanation of how stream data travels from filter to filter across the filter graph. Fundamental to an understanding of data flow in the filter graph is the concept of transports. The term transport simply means the mechanism that two filters use to transfer data across their pin connections. Transports are defined by COM interfaces. Both an input pin on one filter and an output pin on another must agree on a transport mechanism before they can be connected; this process allows them to negotiate how they will exchange stream data as it passes from one filter to the other. The pins agree on the transport using QueryInterface one pin queries the other pin for the interface that corresponds to the desired transport mechanism.

Two principle transport types are used in DirectShow filters: local memory transport and hardware transport. Local memory transport means that the memory is allocated in the computer s main memory or (for video renderers) in a DirectDraw surface or a Direct3D surface, which might be located in video memory. The term hardware transport refers to dedicated memory in hardware, such as video port connections. (A video port is a direct physical connection between a device and a portion of video memory.)

Most DirectShow filters negotiate local memory transport during the pin connection process, storing their stream data in the computer s main memory. Hardware transport is less common and is beyond the scope of this chapter. DirectShow filters utilizing local transport will allocate storage for transport during the connection process. This allocation is always done by the output pin, even if the buffers are being controlled by the input pin. The output pin has to inform the input pin of the downstream filter of the particulars of the buffers it has allocated for data transmission. When it s all working smoothly, local transport should look something like Figure 10-1.

figure 10-1 local transport sends buffers of stream data through the filter graph

Figure 10-1. Local transport sends buffers of stream data through the filter graph

In this diagram, two filters, upstream and downstream, are connected through a pool of available buffers. Data is passed between the pools when a buffer is copied from the upstream pool into the downstream pool and then passed along to the downstream filter. It might seem as though some unnecessary copying is going on. Why can t one of the upstream buffers simply be passed along to the downstream filter? In some cases, such as transform-in-place filters, buffers are passed from filter to filter. In most situations, it actually makes a great deal of sense to keep the buffer pools isolated. Filters can do many different things, both simple and complex; consequently, the time it takes each filter to process a buffer s worth of stream data varies greatly. It could well be that the upstream filter can fill buffers several times faster than the downstream filter can accept them. By keeping the pools separate, the upstream filter can fill all its buffers at top speed and then wait until the downstream filter has emptied one or more of them before it begins operation again. This process leads to a clean execution model for the filter graph.

Local memory transport comes in two flavors, push and pull, defined by the IMemInputPin and IAsyncReader interfaces, respectively. In the push mechanism, using IMemInputPin, the upstream filter manages the flow of data between its output pin and a downstream input pin. When stream data is ready, the output pin calls a method on the downstream filter, pushing the data to it, which then accepts it and processes it, pushing it along to another filter downstream, and so on throughout the entire filter graph. The push transport mechanism is used through the IMemInputPin interface and employed most commonly if the source filter is a live device, such as a microphone or a digital camcorder, that is constantly generating a stream of data for the filter graph.

The other transport mechanism pull reverses the push model. In this case, the downstream filter pulls the data from the output pin of the upstream filter into the input pin of the downstream filter. If the source filter reads data from a file, the next filter in the filter graph is most often some kind of file parser, which converts the file s bits into a stream of video or audio data. In this case, the pull transport mechanism will be employed. The transform filter acting as a file parser makes requests of the upstream filter basically, file read requests and pulls that data into its input pin as it s presented on the output pin of the upstream filter.

The pull model uses the IAsyncReader interface. The upstream filter s operations are entirely under the control of the downstream filter. The pull model allows a parser, for example, to read in a stream of AVI video data. Because the parser knows the structure of an AVI file (which we ll cover in Chapter 14), the parser can request specific offsets into the file for specific data segments. Because the parser can read an AVI file into memory many times faster than it could stream across the filter graph, it can perform asynchronous file I/O, which is more efficient and consumes fewer operating system resources than synchronous I/O. Once past the file parser, the push model dominates the filter graph. The pull model is generally used only between the file source and the parser. After that, the rest of the pin connections in the filter graph use the push model.

An important point for the DirectShow application programmer to note is that quite often you can write data to video memory a great deal faster to than it can be read back into the computer s main memory. That s a natural function of the evolution of video display cards. Although it s very important for the computer to be able to write data to the display quickly, it s rarely necessary for the computer to read that data back into memory. So these devices have been optimized for writing data but not reading. For this reason, the DirectShow programmer should be aware that certain filters, such as filters that perform in-place transforms on buffers that might be in video memory, can potentially slow the execution of a filter graph significantly. (The crucial point to note is whether the memory is located in display hardware or local memory. Display hardware memory can be much slower to access.) Certain configurations of filter graphs cause streams to be pulled from filters supporting hardware that does not support high-speed read access. In these situations, the entire filter graph might grind to a halt while waiting for slow reads to execute. As a result, transform filters upstream from a video renderer filter should use copy transforms rather than in-place transforms.

When a filter graph is paused, it loads media sample buffers with stream data, which allows the filter graph to start operation immediately on receipt of the run command. (When the filter graph goes directly into run mode without pausing first, it pauses internally as its media sample buffers load.) Without this preloaded buffer, the filter graph would need to wait until the source filter had produced sufficient stream data to pass along to the next filter in the graph before execution could begin.

Finally, it s important to note how the three filter graph states running, paused, and stopped and their corresponding messages flow across the filter graph. When the filter graph receives a message to change state, the Filter Graph Manager sends the message to each filter in upstream order, that is, backward from any renderer filters, through any transform filters, and to the source filter. This process is necessary to prevent any samples from being dropped. If a source filter delivers data while the downstream filters are stopped, the downstream filters will reject the data, so the stop message propagates upstream. When a filter stops, it releases any allocated samples it holds, clearing its store of stream data.

Understanding Media Types

Media types are the key element in the negotiation process that connects an output pin to an input pin. Although this topic was introduced in a cursory way in Chapter 8 where we learned how to assign a media type for a DirectShow Editing Services (DES) track group we need to explore fully the concept of the media type and its embodiment in the AM_MEDIA_TYPE data structure. Here s the C++ definition of that data structure:

typedef struct _MediaType { GUID majortype; GUID subtype; BOOL bFixedSizeSamples; BOOL bTemporalCompression; ULONG lSampleSize; GUID formattype; IUnknown *pUnk; ULONG cbFormat; BYTE *pbFormat; } AM_MEDIA_TYPE;

For the purpose of connecting filters together either manually or with Intelligent Connect the first two fields in this structure, majortype and subtype, are the most significant. Both fields hold GUID values that correspond to a list of possible values. The majortype field refers to the general class of media, a broad-brush definition that provides a rough approximation of the media type. Table 10-1 shows a list of the possible values you ll see in the majortype field.

Table 10-1. GUID Values for the majortype Field
GUID	Description
MEDIATYPE_AnalogAudio	Analog audio
MEDIATYPE_AnalogVideo	Analog video
MEDIATYPE_Audio	Audio
MEDIATYPE_AUXLine21Data	Line 21 data; used by closed captions
MEDIATYPE_Interleaved	Interleaved; used by digital video (DV)
MEDIATYPE_Midi	MIDI format
MEDIATYPE_MPEG2_PES	MPEG-2 Packetized Elementary Stream (PES) packets
MEDIATYPE_ScriptCommand	Data is a script command; used by closed captions
MEDIATYPE_Stream	Byte stream with no timestamps
MEDIATYPE_Text	Text
MEDIATYPE_Timecode	Timecode data
MEDIATYPE_Video	Video

There are many different values that might be placed into the subtype field of the AM_MEDIA_TYPE structure. This field further refines the rough definition of the media type from a general class (audio, video, stream, and so on) into a specific format type. Table 10-2 shows just a few of the possible values that could go into the subtype field, which, in this case, are uncompressed RGB types associated with MEDIATYPE_Video.

Table 10-2. Selected GUID Values for the subtype Field
GUID	Description
MEDIASUBTYPE_RGB1	RGB, 1 bit per pixel (bpp), palettized
MEDIASUBTYPE_RGB4	RGB, 4 bpp, palettized
MEDIASUBTYPE_RGB8	RGB, 8 bpp
MEDIASUBTYPE_RGB555	RGB 555, 16 bpp
MEDIASUBTYPE_RGB565	RGB 565, 16 bpp
MEDIASUBTYPE_RGB24	RGB, 24 bpp
MEDIASUBTYPE_RGB32	RGB, 32 bpp, no alpha channel
MEDIASUBTYPE_ARGB32	RGB, 32 bpp, alpha channel

Each of these subtype GUIDs corresponds to a possible video stream format. There are many other possible video subformats and many possible audio subformats, such as MPEG-2 PES packet subformats.

The lSampleSize field of the AM_MEDIA_TYPE structure can be used to calculate the overall requirements to process a stream of that media type. This information is used during the connection process in calculations to allocate buffer pools of memory for the output pins on a filter. An audio sample contains an arbitrary length of audio data, while a video sample most often contains one complete frame. Because of stream compression techniques, the amount of memory needed to store a video frame or an audio sample can vary widely, so the value in lSampleSize gives the filter a useful hint when allocating memory.

The nitty-gritty information about a media stream is contained in two fields, formattype and pbFormat. The formattype field contains another GUID, which references one of the media formats understood by DirectShow, as shown in Table 10-3.

Table 10-3. GUID Values for the formattype Field
Format Type	Format Structure
FORMAT_None or GUID_NULL	None
FORMAT_DvInfo	DVINFO
FORMAT_MPEGVideo	MPEG1VIDEOINFO
FORMAT_MPEG2Video	MPEG2VIDEOINFO
FORMAT_VideoInfo	VIDEOINFOHEADER
FORMAT_VideoInfo2	VIDEOINFOHEADER2
FORMAT_WaveFormatEx	WAVEFORMATEX

The formattype field works hand-in-hand with pbFormat, which holds a pointer to a dynamically allocated block of memory containing the appropriate format structure. For example, if the formattype field contains the GUID FORMAT_WaveFormatEx (implying WAV or similar audio data), the pbFormat field should hold a pointer to a WAVFORMATEX data structure. That data structure contains specific information about the WAV stream, including the sample rate, bits per sample, number of channels, and so on. Finally, the cbFormat field defines the size of the format block; that field must be initialized correctly before any call using the AM_MEDIA_TYPE structure is made.

Working with Media Samples

Filters pass data from output pin to input pin, from the origin to the end of the data stream. This stream data is passed as a series of media samples. Instead of passing pointers to memory buffers through the filter graph (which could get very dangerous very quickly because a dereferenced pointer could lead to a sudden crash), DirectShow encapsulates a media sample inside a COM object, IMediaSample. All DirectShow filters perform their stream data operations on instances of IMediaSample, which has the added advantage of allowing a filter to own a sample, making it less likely that the filter will hand out a sample that s already in use. (Each samples has a reference count to keep track of such things.) The use of the IMediaSample interface also prevents data from being overwritten accidentally.

When a filter receives a media sample, it can make a number of method queries on the IMediaSample interface to get the specifics it will need to operate on the data contained in the sample. The GetPointer method returns a pointer to the buffer of data managed by the media sample. To get the size of the buffer, call the GetSize method. The actual number of bytes of sample data can differ from the size of the sample buffer, so you ll use the GetActualData Length method to find out how much of the buffer data is sample data. When there s been a change in the media type (see the next section for a full explanation of media types), the new type is returned by method GetMediaType (otherwise it returns NULL). The time the sample should start and finish (given in REFERENCE_TIME units of 100 nanoseconds) is returned by the GetTime method.

When there s a change from the media types negotiated during the connection process, the filter needs to redefine the properties of a media sample it presents on its output pin. For example, an audio source filter could change the audio sample rate, which means the output connection would need to signal a change in media type.

Each media sample has a timestamp, indicating the start and stop times of the sample, defined in stream time. Stream time begins when the filter graph enters the run state and increases in 100-nanosecond intervals until the filter graph is stopped. These timestamps are important for maintaining synchronization of a media stream and between media streams. Filters might need to examine the timestamps on several incoming samples to ensure that the appropriate portions of these samples are transformed together. For example, a filter that multiplexes audio and video streams into a single stream must use timestamp information to keep the streams properly synchronized with respect to each other.

Another COM interface, IMemAllocator, is responsible for creating and managing media samples. The allocator is a distinct COM object, with its own interfaces and methods. Although allocators aren t often overridden and implemented on a per-class basis, you ll see that the Sample Grabber in Chapter 11 does implement its own allocator object. DirectShow provides generic allocators and filter classes that use these allocators by default. One method of CTransformFilter, DecideBufferSize (a method that must be overridden in any filter implementation that descends from CTransformFilter), performs a calculation that determines how much memory is required for each media sample. That information is used during the pin connection negotiation process (discussed in the next section) so that media samples can be passed between two pins.

Connecting Pins Together

When two pins connect, they must agree on three things: a media type, which describes the format of the data; a transport mechanism, which defines how the pins will exchange data; and an allocator, which is an object, created and owned by one of the pins, that creates and manages the memory buffer pool that the pins use to exchange data. We ll now make a detailed examination of these steps.

When a connection is attempted between two filters, the filters must determine whether they can compatibly exchange a data stream. They use the AM_MEDIA_TYPE data structure, making queries to each other, until they establish that both can agree on the same media type for data transfer. Although the Filter Graph Manager is responsible for adding filters to the graph and sending connection messages to pins on respective filters, the actual work of connecting pins together is performed by the pins themselves. This fact is one reason why an IPin object with its own methods is instantiated for every pin on a DirectShow filter.

The connection process begins when the Filter Graph Manager calls the IPin::Connect method on the output pin. This method is passed a pointer to the downstream input IPin object and, optionally, a pointer to an AM_MEDIA_TYPE structure. The output pin can (and usually does) examine the majortype and subtype fields as a quick way to determine whether it should even bother continuing with the connection process. This way, filters don t waste time when they obviously don t deal with compatible media types, such as when a video transform filter might try to connect to an audio stream. The Connect method can fail immediately without wasting time going through an extensive connection process.

If the AM_MEDIA_TYPE provided to the output pin by the Filter Graph Manager is acceptable to the pin that is, if there s some chance the filters can be connected together the output pin immediately calls the input pin s IPin::EnumMediaTypes method. This method will return a list of the input pin s preferred media types. The output pin will walk through this list examining each media type returned, and if it sees anything it likes, it will specify a complete media type based on the list of preferred types and then signal the input pin that a connection should be made by invoking its IPin::ReceiveConnection method. If none of the media types are acceptable to the output pin, it will propose media types from its own list. If none of these succeed, the connection process fails.

Once the pins have agreed on a media type, they need to negotiate a transport, that is, whether a push or a pull model will be employed to transfer data between the pins. The most common transport is the push model, which is represented by the IMemInputPin interface on the IPin object. If the output pin determines through a query that the input pin supports this interface, the output pin will push the data downstream to the input pin.

Finally the two pins negotiate the number and size of memory buffers they will share when streaming begins. The upstream filter will write data into those buffers, while the downstream filter will read that data. Except in the case of hardware transport, the process of allocating and managing buffers is handled by a separate allocator object, which is owned by one of the pins. The pins negotiate which allocator will be used, together with the size and number of buffers that will be needed. In any case, the output pin is responsible for selecting the allocator, even if the output pin needs to make a request to the input pin to propose its own allocator. In both cases, the output pin is in control of the allocation process.

Most allocations are performed with regular memory allocation operations, but filters that represent hardware devices might create buffers on those devices. Such filters might insist on using their own allocators and reject allocation attempts from upstream output pins. For example, the DirectShow video renderers have their own allocators that create DirectDraw and Direct3D surfaces for display. The Video Renderer filter will use its own allocator, but if forced, it will use GDI (much slower) for rendering. The Video Mixing Renderer, covered in Chapter 9, will insist on using its own allocator. If the allocation negotiation succeeds, the Connect call returns successfully.

Selecting a Base Class for a DirectShow Filter

Transform filters, such as the Sample Grabber, are defined in C++ as individual classes, descendants of one of four different base classes from which they inherit their particular properties. The selection of a base class for your DirectShow filter is driven by the data-processing needs of the filter. If your intrafilter data processing needs are minimal, there s a base class that will do most of the work for you, but if you need a sophisticated filter, you ll find that another base class will give you the freedom to create a complex filter.

The CTransInPlaceFilter class is a base class designed for filters that pass along the upstream buffers they receive to downstream filters without any need to copy the data to new buffers. They transform the data in place, hence the name of the filter. The same buffer that s passed to a filter derived from CTransInPlaceFilter is passed to the next filter downstream. The Grabber Sample covered in the Chapter 11, which doesn t do much beyond ensuring that its input pin and output pin accept matching data types, is derived from CTransInPlaceFilter. Although this class doesn t provide a lot of flexibility in filter design, it does do most of the behind-the-scenes work involved in creating and managing the filter.

Most commonly, you ll be creating transform filter classes that are descendent from CTransformFilter, such as the YUVGray transform filter covered in this chapter. Filters based on CTransformFilter have separate input and output buffers, and they can manipulate the data in any desired way as the data passes between input and output. When an input buffer arrives, it s transformed by the filter an operation that copies it into an output buffer so that it can be sent to the downstream filter. For filters that work with video signals, which are very sensitive to stream delays and synchronization issues, the CVideoTransform Filter class (designed primarily for use with video decoders) offers the same features as the CTransformFilter class, but it will automatically drop frames (that is, drop input samples) if the downstream renderer falls behind. Frame dropping can be very useful for transform filters that are placed on a preview render path of a filter graph, where video quality is being sacrificed to keep processor power available for an output rendering path.

Finally, for filters that have their own requirements above and beyond those provided for with these base classes, the CBaseFilter base class provides a shell of basic DirectShow filter functionality. CBaseFilter provides very little support for buffer management, connection negotiation, and so on, so if you do use CBaseFilter to create a DirectShow filter class, you ll be doing most of the work yourself. The upside of the extra effort required is that you can write a transform filter highly optimized to the kinds of tasks expected of it and produce a streamlined design that covers all the important implementation tasks without any extraneous code.