Buffering DirectShow Media Streams | Programming Microsoft DirectShow for Digital Video and Television (Pro-Developer)

Buffering DirectShow Media Streams

With the release of Windows XP Service Pack 1, two new filters were added to DirectShow to handle buffered playback of media streams. These filters replace the renderer and capture filters that you d expect to find in a filter graph. The media stream (or streams, if you consider that audio and video might be on different streams as they pass through a filter graph) passes through the filter graph and is sent to the Stream Buffer Engine Sink filter. This filter takes the stream and writes it to a series of disk files. Another filter, the Stream Buffer Engine Source filter, reads the streams from the files created by the Stream Buffer Engine Sink filter and passes them downstream to a renderer.

Although it might seem that you d create a single filter graph with separate streams for the Sink and Source filters, DirectShow requires that you create two completely separate filter graphs. Although we haven t explored this form of DirectShow programming, it s entirely safe to create multiple filter graphs within a single application. To create a buffering DirectShow application, create one filter graph that takes the capture stream which might be coming from a camcorder, TV tuner, webcam, and so on and write it out through the Stream Buffer Engine Sink. Then create a second filter graph that uses a Stream Buffer Engine Source as the capture filter, passing those streams (remember, multiple streams can be buffered with the Stream Buffer Engine) along to a renderer. Figure 7-3 will help you understand data flow in a buffering DirectShow application.

figure 7-3 the two filter graphs required in a buffering directshow application

Figure 7-3. The two filter graphs required in a buffering DirectShow application

Once the filter graphs have been created and begin executing, streams are written to disk by the Stream Buffer Engine Sink in the first filter graph. At the same time, this data is received by the Stream Buffer Engine Source in the second filter graph. Because these filter graphs are entirely separate, it s possible to pause the second filter graph without affecting the execution of the first filter graph. This means that, for example, a TV broadcast can be paused at the Stream Buffer Engine Source, while the broadcast continues to be written to disk through the Stream Buffer Engine Sink. From the user s perspective, the TV broadcast will pause at the user s command until the filter graph receives the run command, which causes it to resume playback. None of the TV broadcast will be lost because the Stream Buffer Engine Source will write the content to disk, where it will stay, safe and secure, until the Stream Buffer Engine Sink reads it in. Conversely, if the first graph, containing the Stream Buffer Engine Sink, is paused, playback will pause when the Stream Buffer Engine Source runs out of buffered stream to render.

Now you can see why DirectShow requires two entirely separate filter graphs when using the stream buffering capabilities. Because these filter graphs are independent, control messages sent to one filter graph (pause, run, and stop) don t affect the other in any way. In addition, you can send commands to the Stream Buffer Engine Source that will rewind or fast-forward the stream, which gives you the capability to provide the user with VCR-like controls over the stream. Those commands have no effect on the Stream Buffer Engine Sink filter, which keeps writing its streams to disk, oblivious to any playback commands issued to the Stream Buffer Engine Source filter.

Building a Buffering TV Application

Now that we ve explored the theory of the Stream Buffer Engine, we need to put it into practice and roll our own TiVo equivalent. Although using the Stream Buffer Engine might seem complex, its requirements are only a little greater than those for a more prosaic DirectShow application.

Although any stream can be buffered using DirectShow the Stream Buffer Engine doesn t care what kind of data is being spooled out to disk the Stream Buffer Engine has some preferences when it comes to video streams. As of the first release of the Stream Buffer Engine, video streams should be passed to the Stream Buffer Engine Sink in one of two formats: either DV Video, which you ll receive from a digital camcorder or digital VCR, or MPEG-2, which is the video encoding format used on DVDs. MPEG-2 is the preferred format for the Stream Buffer Engine, and it is used by the first generation of Media Center PCs because some TV tuner cards perform on-the-fly MPEG-2 encoding of their video signals. One important consideration is that an MPEG-2 video stream must be passed through the MPEG-2 Video Stream Analysis filter before going on to the Stream Buffer Engine Sink. This filter is used to detect frame boundaries and other video characteristics that are important for handling trick play effects such as rewind and fast-forward. You can pass other video streams to the Stream Buffer Engine Sink, but the Source won t be able to render them for you, making them, in essence, useless. Audio streams don t have the same constraints; you can pass any kind of recognizable audio stream or closed- captioning data, and the Stream Buffer Engine will handle it.

All this means that you might need to convert a given video stream to a specific format (either MPEG-2 or DV Video) if it isn t already in one of the two supported formats. For example, if you build a buffering application for a webcam (which would be a little weird, but you could do it), you ll have to convert its video stream from its native format to DV Video before passing it into the Stream Buffer Engine Sink. You do so by routing the stream through a DV Video Encoder filter (a built-in DirectShow filter) before it enters the Stream Buffer Engine Sink.

Although you might think that you could just pass a stream directly from a DV camcorder into the Stream Buffer Engine Sink, given that it s already in a legal format, this isn t actually so. The stream from a DV camcorder is a multiplexed signal of combined video and audio streams. These streams need to be demultiplexed, using the DV Splitter filter, before the separated video and audio streams can be passed along to the Stream Buffer Engine Sink. If you don t do this, you ll see nothing but a black video image when the Source renders the multiplexed stream.

Finally, when dealing with TV tuners, you must remember that the digitized output might not be provided in either MPEG-2 or DV format, which means a conversion is often necessary when putting a broadcast TV signal through the Stream Buffer Engine. (Digital TV signals are already in MPEG-2 format and won t need to be converted at all.) In addition to the video stream, an audio stream also has to be sent to the Stream Buffer Engine Sink. This process is not as straightforward as it might sound because TV tuners do very strange things to the audio portion of a TV broadcast. On my own computer (equipped with an ATI Rage 8500), I had to wire the audio output from the TV tuner card into one of the inputs on my sound card (Line In). If I don t do that, I don t have any audio signal from my TV tuner.

The reasons for such a strange configuration of hardware are murky. It might seem more appropriate to create a DirectShow-compatible audio capture filter for the TV tuner so that the TV audio stream can be directly captured by the filter graph. However, in most cases, you ll need to create an audio capture filter for your computer s sound card and then use that filter as the audio source for the TV signal, passing that stream along to the Stream Buffer Engine Sink.

The preceding are the basic cautions that need to be observed when creating a buffering DirectShow application. Now let s look at the main function of a simple DirectShow program, TVBuff, which takes a live TV signal and provides TiVo-like functionality using the Stream Buffer Engine. Here s how things begin:

// A basic program to buffer playback from a TV Tuner using DirectShow. int main(int argc, char* argv[]) { ICaptureGraphBuilder2 *pCaptureGraph = NULL; // Capture graph // builder object IGraphBuilder *pGraph = NULL; // Graph builder object for sink IMediaControl *pControl = NULL; // Media control interface for sink IGraphBuilder *pGraphSource = NULL; // Graph builder object for source IMediaControl *pControlSource = NULL; // Filter control interface IBaseFilter *pVideoInputFilter = NULL; // Video Capture filter IBaseFilter *pDVEncoder = NULL; // DV Encoder Filter IBaseFilter *pAudioInputFilter = NULL; // Audio Capture filter IStreamBufferSink *pBufferSink = NULL; // Initialize the COM library. HRESULT hr = CoInitialize(NULL); if (FAILED(hr)) { // We'll send our error messages to the console. printf("ERROR - Could not initialize COM library"); return hr; } // Create the capture graph builder and query for interfaces. hr = CoCreateInstance(CLSID_CaptureGraphBuilder2, NULL, CLSCTX_INPROC_SERVER, IID_ICaptureGraphBuilder2, (void **)&pCaptureGraph); if (FAILED(hr)) // FAILED is a macro that tests the return value { printf("ERROR - Could not create the capture graph builder."); return hr; } // Create the Filter Graph Manager and query for interfaces. hr = CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, IID_IGraphBuilder, (void **)&pGraph); // Now tell the capture graph builder about the Filter Graph Manager. hr = pCaptureGraph->SetFiltergraph(pGraph); // Using QueryInterface on the graph builder, // get the Media Control object. hr = pGraph->QueryInterface(IID_IMediaControl, (void **)&pControl); if (FAILED(hr)) { printf("ERROR - Could not create the Media Control object."); pCaptureGraph->Release(); pGraph->Release(); // Clean up after ourselves CoUninitialize(); // And uninitialize COM return hr; }

There s nothing here we haven t seen before. This code is all standard DirectShow initialization code that creates a capture graph builder and then instantiates a Filter Graph Manager associated with it. This filter graph will handle the Stream Buffer Engine Sink. That s why it s associated with a capture Filter Graph Manager this is where the streams will originate. Now we need to build the filter graph and fill it with appropriate capture and transform filters.

 // Now create the video input filter from the TV Tuner. hr = GetVideoInputFilter(&pVideoInputFilter, L"ATI"); if (SUCCEEDED(hr)) { hr = pGraph->AddFilter(pVideoInputFilter, L"TV Tuner"); } // Now, let's add a DV Encoder, to get a format the SBE can use. hr = AddFilterByCLSID(pGraph, CLSID_DVVideoEnc, L"DV Encoder", &pDVEncoder); // Now that the capture sources have been added to the filter graph // we need to add the Stream Buffer Engine Sink filter to the graph. // Add the Stream Buffer Source filter to the graph. CComPtr<IStreamBufferSink> bufferSink; hr = bufferSink.CoCreateInstance(CLSID_StreamBufferSink); CComQIPtr<IBaseFilter> pSinkF(bufferSink); hr = pGraph->AddFilter(pSinkF, L"SBESink"); // Now add the video capture to the output file. hr = pCaptureGraph->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video, pVideoInputFilter, pDVEncoder, pSinkF); // Now we've got to wire the Audio Crossbar for the TV signal. hr = ConfigureTVAudio(pCaptureGraph, pVideoInputFilter); // Now we instantiate an audio capture filter, // which should be picking up the audio from the TV tuner... hr = GetAudioInputFilter(&pAudioInputFilter, L"SoundMAX Digital Audio"); if (SUCCEEDED(hr)) { hr = pGraph->AddFilter(pAudioInputFilter, L"TV Tuner Audio"); } // And now we add the audio capture to the sink. hr = pCaptureGraph->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Audio, pAudioInputFilter, NULL, pSinkF); // And now lock the Sink filter, like we're supposed to. hr = bufferSink->QueryInterface(&pBufferSink); hr = pBufferSink->LockProfile(NULL); // Before we finish, save the filter graph to a file. SaveGraphFile(pGraph, L"C:\\MyGraph_Sink.GRF");

We make a call to GetVideoInputFilter (you might remember this function from Chapter 4), which returns a pointer to an instanced filter with a name that matches the string ATI. Note that these hard-coded strings specify hardware that might not be on your system, and it s not the right way to do things, except by way of example. The appropriate mechanism enumerating a list of devices and then letting the user select from it has already been covered in Chapter 6 as part of WinCap, so we don t need to discuss it here. The TV tuner s video capture filter is added to the filter graph, along with a DV Encoder filter. This filter will handle the translation between the format of the digital video stream from the TV tuner and the Stream Buffer Engine Sink filter.

Once those filters have been added, we need to instantiate an IStreamBufferSink filter, which is added to the filter graph. Because we re using the capture graph builder to help us build the filter graph, we can use the RenderStream method to create a render path between the capture filter (that is, the TV tuner) and the Stream Buffer Engine Sink filter, requesting that it pass the stream through the DV Encoder filter on the way to the Sink filter.

Next we deal with the peculiar nature of TV tuner audio (as discussed previously) by making a call to ConfigureTVAudio. This function is a slightly modi fied version of the method CCaptureGraph::ConfigureTVAudio in WinCap, which was fully covered in Chapter 6. The function connects the audio input and output pins of the crossbar using the Route method in the ConnectAudio function, which is itself taken from WinCap. (At this point, audio might begin to play on some computers with some combinations of TV tuner cards and sound hardware. That s because the audio crossbar routing is sending the signal from the TV tuner to the sound card.)

Now that we have a path for the broadcast audio signal into the filter graph, we call GetAudioInputFilter. (Once again, I ve passed it the Friendly Name of the sound card on my computer, SoundMAX, but your sound card probably has a different FriendlyName.) That filter is added to the graph, and another call to RenderStream creates a connection between it and the Stream Buffer Engine Sink. Because this is an audio stream, we don t need any intermediate conversion and the stream is passed directly to the Sink filter.

Now we need to issue a command to the Sink filter through its IStreamBufferSink interface. This call, LockProfile, initializes the file system requirements for stream buffering, creating a stub file, which is basically a pointer to other files where the stream data is being stored. This stub file is needed for the Stream Buffer Engine Source filter because it provides all the information the Source filter will need when it starts reading streams through the Stream Buffer Engine. Although the DirectShow SDK documentation states that LockStream is called automatically when the Filter Graph Manager executes its Run call, it s better to do it explicitly. You can use an explicit call to LockProfile to specify a file name for the stub file. The Stream Buffer Engine Source, which reads the stub file information and translates it into DirectShow streams, can be located in an entirely separate application, as long as the file name is known to both programs. (Alternately, the processes could be running at different privilege levels within the same application. The security IDs provided for in the IStreamBufferInitialize interface method SetSIDs allow you to control access to the buffer files.)

Finally we make a call to SaveFilterGraph so that we can take a snapshot of the filter graph with all capture and Stream Buffer Engine Sink elements, as shown in Figure 7-4.

figure 7-4 one of the tvbuff filter graphs, which feeds streams into the stream buffer engine sink filter

Figure 7-4. One of the TVBuff filter graphs, which feeds streams into the Stream Buffer Engine Sink filter

The capture filter, labeled TV Tuner, sends its output through an AVI Decompressor filter, converting the TV tuner s output to match the input requirements of the DV Encoder. From the DV Encoder, the video stream passes into the Sink filter, while the SoundMAX card, labeled TV Tuner Audio, sends its output through a Smart Tee filter and then into the Sink filter. We can have as many streams as we need going into the Sink filter. For example, closed-captioning information could be sent to the Sink filter to be decoded after it passes out of the Stream Buffer Engine Source filter in a second filter graph. Now that we ve built the first filter graph, we have to move on to the second.

 // OK--now we're going to create an entirely independent filter graph. // This will be handling the Stream Buffer Engine Source, // which will be passed along to the appropriate renderers. hr = CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, IID_IGraphBuilder, (void **)&pGraphSource); // Using QueryInterface on the graph builder, // get the Media Control object. hr = pGraphSource->QueryInterface(IID_IMediaControl, (void **)&pControlSource); // Now instantiate the StreamBufferEngine Source // and add it to this filter graph. // Add the Stream Buffer Source filter to the graph. CComPtr<IStreamBufferSource> pSource; hr = pSource.CoCreateInstance(CLSID_StreamBufferSource); CComQIPtr<IBaseFilter> pSourceF(pSource); hr = pGraphSource->AddFilter(pSourceF, L"SBESource"); hr = pSource->SetStreamSink(bufferSink); CComQIPtr<IStreamBufferMediaSeeking> pSourceSeeking(pSource);

Once again, we build a filter graph; however, because this filter graph isn t very complicated, we don t need to create a capture graph builder. Instead, we create the standard IGraphBuilder interface. Once that interface has been instantiated and its media control interface has been queried, we move on to instance the Stream Buffer Engine Source filter, IStreamBufferSource, and its associated interfaces. The filter is added to the graph, and then a call is made to its SetStreamSink method. This method gets a pointer to the IStreamBufferSink object located in the first filter graph. This call is used identify the Stream Buffer Engine Sink associated with this Source filter. Once that s done, the Source filter has complete knowledge of the stream types that it can supply to the filter graph.

Finally we get a pointer to the IStreamBufferMediaSeeking interface. This interface will be used a bit further along, when we want to control our position in the stream, giving us the ability to rewind or fast-forward a stream. With all the setup work done in this second filter graph, rendering the streams available through the Stream Buffer Engine Source filter becomes a very simple affair.

 // Now, all we need to do is enumerate the output pins on the source. // These should match the streams that have been setup on the sink. // These will need to be rendered. // Render each output pin. CComPtr<IPin> pSrcOut; CComPtr<IEnumPins> pPinEnum; hr = pSourceF->EnumPins(&pPinEnum); while (hr = pPinEnum->Next(1, &pSrcOut, 0), hr == S_OK) { hr = pGraphSource->Render(pSrcOut); pSrcOut.Release(); }

We ve seen code fragments like this before, in Chapter 4. We use the enumeration features of the IEnumPins object to walk through all the output pins on the Stream Buffer Engine Source filter. Each pin corresponds to a media stream, and as each pin is detected, the filter graph s Render method is called for that pin. In the case of a TV tuner, there are two streams video and audio sent into the Sink filter, and there will be two corresponding output pins again, video and audio available on the Source filter. Each of these streams is rendered independently; the video output is rendered with a Video Renderer filter, and the audio output is rendered with a Default DirectSound Device filter.

That s all that s required to construct the second filter graph; there are no fancy capture filters to deal with and no stream conversions required. Every output pin from the Stream Buffer Engine Source filter has been rendered. For a final touch, we call SaveGraphFile to preserve a copy of the filter graph in a disk file. Now we can send the Run message to both filter graphs and then go into a loop waiting for keyboard input from the user.

 if (SUCCEEDED(hr)) { // Run the graphs. Both of them. hr = pControl->Run(); hr = pControlSource->Run(); if (SUCCEEDED(hr)) { // Wait patiently for completion of the recording. wprintf(L"ENTER to stop, SPACE to pause, BACKSPACE to rewind, F to fastforward.\n"); bool done = false; bool paused = false; while (!done) { // Wait for completion. int ch; ch = _getch(); // We wait for keyboard input switch (ch) { case 0x0d: // ENTER done = true; break; case 0x20: // SPACE if (paused) { wprintf(L"Playing...\n"); pControlSource->Run(); paused = false; } else { wprintf(L"Pausing...\n"); pControlSource->Pause(); paused = true; } break; case 0x08: // BACKSPACE - Rewind one second, // if possible // First, let's find out how much play we have. // We do this by finding the earliest, latest, // current, and stop positions. // These are in units of 100 nanoseconds. Supposedly. LONGLONG earliest, latest, current, stop, rewind; hr = pSourceSeeking->GetAvailable(&earliest, &latest); hr = pSourceSeeking->GetPositions(&current, &stop); // We would like to rewind 1 second, // or 10000000 units if ((current - earliest) > 10000000) { // Can we? rewind = current - 10000000; // Yes } else { rewind = earliest; // Back up as far as we can } // If we can, change the current position // without changing the stop position. hr = pSourceSeeking->SetPositions(&rewind, AM_SEEKING_AbsolutePositioning, NULL, AM_SEEKING_NoPositioning); break; case 0x46: // That's F case 0x66: // And f Fast-forward one second, // if possible // First, let's find out how much play we have. // We do this by finding the earliest, latest, // current, and stop positions. // These are in units of 100 nanoseconds. Supposedly. LONGLONG fastforward; hr = pSourceSeeking->GetAvailable(&earliest, &latest); hr = pSourceSeeking->GetPositions(&current, &stop); // We would like to fast-forward 1 second, // or 10000000 units. if ((latest - current) > 10000000) { // Can we? fastforward = current + 10000000; // Yes } else { fastforward = latest; // Just go forward } // If we can, change the current position // without changing the stop position. hr = pSourceSeeking->SetPositions(&fastforward, AM_SEEKING_AbsolutePositioning, NULL, AM_SEEKING_NoPositioning); break; default: // Ignore other keys break; } } }

Here we encounter the real magic of the Stream Buffer Engine. We go into a loop driven by user keypresses (not very elegant, but effective). A tap on the space bar pauses or resumes the playback of the second Stream Buffer Engine Source filter graph, simply by calling the Filter Graph Manager s Pause and Run methods. That s all that s required to add TiVo-like pause/resume functionality! Because the filter graphs are entirely separate, a pause command issued to one filter graph has no effect on the other.

Things get only a little more complex when the user presses the backspace key (which rewinds the stream one second) or the F key (which fast-forwards it one second). Here we use the IStreamBufferMediaSeeking interface of the Source filter. (This interface is identical to the IMediaSeeking interface used throughout DirectShow to search through a stream. IMediaSeeking is acquired by querying the filter graph, while IStreamBufferMediaSeeking is acquired by querying a specific filter.) The IStreamBufferMediaSeeking interface knows the Source filter s position in the stream, the total duration of the stream, and the stream s start and stop points. We need to know that information before we can change the Source filter s position in the stream. When the Source filter changes its position to an earlier point in the stream, it will appear to the user as though the stream has rewound; if a later point in the stream is selected, it will appear as though the stream has been fast-forwarded.

Two calls need to be made before positional changes in the stream can be effected. These calls, to IStreamBufferMediaSeeking methods GetAvailable and GetCurrent, return values, in 100-nanosecond (100 billionths of a second) intervals, indicating the start and stop points of the stream, and the current and stop points of the stream, respectively. Why do we need this information? We have to ensure that we don t rewind to a point before the start of the stream or fast-forward past the end of the stream.

Some simple arithmetic ensures that there s enough room between the current position in the stream and some area either before it (in the case of rewind) or after it (for fast-forward). At this point, another call is made, this one to IStreamBufferMediaSeeking method SetPositions. This method takes four parameters. The first parameter is a value representing the new current stream position (in effect, it resets the stream position), along with a parameter that indicates it is an absolute value (which it is, in this case) or a value relative to the Source s current position in the stream. The third and fourth parameters allow you to change the stop position in the stream. we don t want to do this, so we pass NULL and AM_Seeking_NoPositioning as the values in these fields, which preserves the stream s stop position. Once the call is made, playback immediately proceeds from the new stream position, and the user sees either a 1-second rewind or a 1-second fast-forward. These changes in stream position can be repeated as many times as desired, but only when the filter graph is executing. If the filter graph is paused, the changes will have no effect.

Finally, after the user hits the Enter key, we stop both filter graphs, clean everything up, and exit.

 // And let's stop the filter graph. hr = pControlSource->Stop(); hr = pControl->Stop(); wprintf(L"Stopped.\n"); // To the console } // Now release everything, and clean up. pSourceSeeking.Release(); pSinkF.Release(); bufferSink.Release(); pSourceF.Release(); pSource.Release(); pDVEncoder->Release(); pVideoInputFilter->Release(); pAudioInputFilter->Release(); pControlSource->Release(); pGraphSource->Release(); pControl->Release(); pGraph->Release(); pCaptureGraph->Release(); pBufferSink->Release(); CoUninitialize(); return 0; }

As you ve probably already realized, using the Stream Buffer Engine doesn t take a lot of code or a lot of work, but it can provide enormous functionality to your DirectShow applications if you need it. Although TVBuff demonstrates the rudimentary capabilities of the Stream Buffer Engine, a lot more can be done with it than we ve covered here. For example, a permanent, high-quality recording of a TV program can be recorded to a sink file for later playback, which is very similar to the kind of functionality that TiVo offers.

We haven t touched on the issue of stale content, which is created when a live broadcast is paused and then the channel is changed. Because the new channel s video stream hasn t been buffered, the stream buffer engine needs to clear the buffer of the previous channel s data or else rewinding the channel could result in a leap back to an earlier channel. (Holes can also be created when a buffer is deleted in the Stream Buffer Engine sink. The sink creates a series of five-minute files to hold stream data and then begins deleting them, from last to most recent, but because of the seeking features of the Stream Buffer Engine, you could be viewing the earliest buffer while a buffer following it has already been deleted.) You ll want to look at the Stream Buffer Engine event codes in the DirectX SDK documentation for more information on how to catch and prevent these situations from occurring.