Chapter 11: Customizing Compositors | Fundamentals of Audio and Video Programming for Games (Pro-Developer)

In Chapter 10, you learned how to write an allocator-presenter that renders video from the Video Mixing Renderer (VMR) onto a Direct3D scene of your choice. As you saw, the allocator-presenter is not involved in the video mixing process. Instead, another component, the compositor, mixes the input streams, and the VMR passes the result to the allocator-presenter.

The VMR s default compositor gives you a fair amount of control over the mixing process, through the IVMRMixerControl9 interface (see Chapter 9). You can position and resize the input streams relative to each other, set the transparency and z-order , and so forth ” but none of the mixing settings goes beyond your basic video inside a rectangle. We want circular video, fragmented video, false- color video ” in short, weird-looking video ” and we can do that by writing our own compositors.

Implementing a Compositor

A compositor must implement the IVMRImageCompositor9 interface. We ll give a quick overview of the methods in this interface before we look at some implementations .

InitCompositionDevice

This method is called when the VMR selects a new render target.

 HRESULT InitCompositionDevice(    IUnknown*  pD3DDevice  );

The pD3DDevice parameter is a pointer to the IUnknown interface of the Direct3D device. You can query this pointer for the IDirect3DDevice9 interface. Typically, a compositor will cache this pointer so that it can call methods on the device later. You could also use the InitCompositionDevice method to perform other initializations. For example, you might attach a depth/stencil buffer to the render target, or create any resources that you ll need later.

SetStreamMediaType

This method is called whenever the format changes on an input stream.

 HRESULT SetStreamMediaType(      DWORD  dwStrmID,      AM_MEDIA_TYPE*  pmt,      BOOL  fTexture  );

The dwStrmID parameter identifies the input stream. This number corresponds to the index of the input pin on the VMR that receives the video stream. The first pin is stream 0, the second is stream 1, and so on. During mixing, the VMR sorts the video frames according to their z-order. As a result, the stream identifier does not always correspond to the mixing order. For example, the compositor may be asked to render stream 2 on the bottom, stream 0 in the middle, and stream 1 on the top. The application sets the z-order by calling IVMRMixerControl9::SetZOrder , as described in Chapter 9. The range of stream identifiers is fixed at 0 through 15.

The pmt parameter is a pointer to the AM_MEDIA_TYPE structure. This structure, called a media type , is used in DirectShow to describe any kind of media format. In this case, the pmt parameter describes the format of the video stream. The pointer may also be NULL to clear the media type for that stream. We ll have more to say about media types later in this chapter.

The fTexture flag is TRUE if the decoder surface is a texture, or FALSE otherwise . This information is useful if you want to use textures in your compositor. If the flag is FALSE , you ll need to create a private texture and copy the video surface to that texture.

If your compositor refers to the media type and the texture flag during rendering, copy this information into an array that is indexed by stream identifier. If pmt is NULL , clear the array entry. Be aware that SetStreamMediaType can be called multiple times for each stream, with both NULL and non- NULL pointers assigned to pmt , seemingly at random. (Of course, it s not really random; it s all part of the DirectShow filter connection process.)

CompositeImage

This method is used to mix the video frames, so it s where most of the work is done in a compositor.

 HRESULT CompositeImage(      IUnknown*  pD3DDevice,      IDirect3DSurface9*  pRenderTarget,    AM_MEDIA_TYPE*  pmtRenderTarget,    REFERENCE_TIME  rtStart,    REFERENCE_TIME  rtEnd,    D3DCOLOR  dwClrBkGnd,    VMR9VideoStreamInfo*  pVideoStreamInfo,    UINT  cStreams  );

The parameters to this method contain all of the information needed to mix the video streams:

pD3DDevice is a pointer to the device s IUnknown interface.
pRenderTarget is a pointer to the render target where the video frames should be drawn. The compositor does not have to set the render target on the device. The VMR automatically sets the render target before calling CompositeImage .
pmtRenderTarget is a media structure that describes the format of the render target.
rtStart and rtEnd contain the start and end times of the final composited image. These values are useful if your compositor does any kind of transition between videos . You can use the start and end times to calculate how far along you are in the transition.
dwClrBkGnd is the background color for the render target surface. The application sets this value by calling IVMRMixerControl9::SetBackgroundClr . If any part of the render target does not contain composited video, or if the video is transparent, the background should be filled with this color.
pVideoStreamInfo is a pointer to an array of VMR9VideoStreamInfo structures that contain additional information about each stream. The size of the array is given in cStreams . The array is sorted by z-order, from back to front. That means you can loop through the array in order, draw each frame, and the resulting image will have the correct z-ordering.

Here is the definition of the VMR9VideoStreamInfo structure.

 typedef struct VMR9VideoStreamInfo {      IDirect3DSurface9*  pddsVideoSurface;      DWORD               dwWidth;      DWORD               dwHeight;      DWORD               dwStrmID;      FLOAT               fAlpha;      VMR9NormalizedRect  rNormal;      REFERENCE_TIME      rtStart;      REFERENCE_TIME      rtEnd;      VMR9_SampleFormat   SampleFormat;  };

This structure contains the following members :

pddsVideoSurface is a pointer to the surface that contains the decoded video frame.
dwWidth and dwHeight are the width and height of the video surface.
dwStrmID is the stream identifier.
fAlpha is the stream s alpha value. The application sets this value by calling IVMRMixerControl9::SetAlpha . You should apply this value to the entire video surface.
rNormal specifies the normalized rectangle where the video should appear. If you recall from Chapter 9, the render target is defined as having coordinates that range from (0.0, 0.0) to (1.0, 1.0). Anything that falls outside this range is clipped. Each stream is positioned relative to these coordinates. The application sets the normalized rectangle by calling IVMRMixerControl9::SetOutputRect .
rtStart and rtEnd are the start and end times of the frame. These values do not necessarily match the rtStart and rtEnd parameters for the composited image, because the various input streams can have different frame rates.
For interlaced video, the SampleFormat flag indicates how a particular frame is interlaced. For progressive frames, the value is VMR9_SampleProgressiveFrame .

Keep in mind that your own custom compositor can ignore any of the mixing settings that are defined by the IVMRMixerControl9 interface. For example, you may not care about the placement of the video streams in composition space, in which case you can ignore the values for rNormal . It all depends on what effect you are trying to achieve. Also, remember that the compositor does not present the final image ” that s the responsibility of the allocator-presenter.

TermCompositionDevice

This method is called when the VMR-9 has finished mixing. Use it to free resources, if needed.

 HRESULT TermCompositionDevice(      IUnknown*  pD3DDevice  );

Again, the pD3DDevice parameter is a pointer to the device s IUnknown interface.

A Note about Compositors and Deinterlacing

One significant drawback of custom compositors should be noted, which is the lack of hardware-accelerated deinterlacing. The compositor is responsible for deinterlacing any interlaced video content that it receives. The default compositor of the VMR performs hardware-accelerated deinterlacing when the graphics card supports it. Unfortunately, there is no way for a user -mode component to use the same functionality. To perform deinterlacing in a compositor, therefore, you would have to implement the deinterlacing routines in software ” which is not a trivial task. With no deinterlacing support, it s effectively impossible to write a general-purpose compositor. In the context of game programming, however, you can assume that you control the authored content, so you can side-step the whole problem by not using interlaced video.