Basic Video Mixing | Fundamentals of Audio and Video Programming for Games (Pro-Developer)

The next sample is called BasicMixer, which shows how to mix multiple video streams according to the settings defined by the IVMRMixerControl9 interface. You can use the BasicMixer sample as the starting point for your own compositors .

InitCompositionDevice

We use this method to cache the IDirect3DDevice9 pointer. We also determine the best texture format that is supported by the graphics driver. This information is stored for later use.

 STDMETHODIMP CMixer::InitCompositionDevice(IUnknown*  pD3DDevice)  {      CLock lock(&m_CritSec);      HRESULT hr = pD3DDevice->          QueryInterface(IID_IDirect3DDevice9, (void**)&m_pDevice);      if (SUCCEEDED(hr))      {          hr = FindVideoTextureFormat();      }      return hr;  }

The FindVideoTextureFormat function loops through an array of texture surface formats and calls IDirect3D9::CheckDeviceFormat . The first texture format that is supported is stored in the m_TexFormat member variable.

 HRESULT CMixer::FindVideoTextureFormat()  {      CComPtr<IDirect3D9> pD3D;      D3DDEVICE_CREATION_PARAMETERS params;      D3DDISPLAYMODE mode;      // The list of texture formats to test.      D3DFORMAT formats[] =      {          D3DFMT_R8G8B8,          D3DFMT_X8R8G8B8,          D3DFMT_R5G6B5,          D3DFMT_X1R5G5B5      };      m_pDevice->GetDirect3D(&pD3D);      m_pDevice->GetCreationParameters(&params);      m_pDevice->GetDisplayMode(0, &mode);      for (int i = 0; i < ARRAY_SIZE(formats); i++)      {          HRESULT hr = pD3D->CheckDeviceFormat(params.AdapterOrdinal,              D3DDEVTYPE_HAL, mode.Format, D3DUSAGE_RENDERTARGET,              D3DRTYPE_TEXTURE, formats[i]);          if (SUCCEEDED(hr))          {              // Go with this format.              m_TexFormat = formats[i];              return S_OK;          }      }      // No formats were found.      return E_FAIL;  }

SetStreamMediaType

In this method, we copy the media type to an array that has one entry for each input pin, up to the maximum of 16 pins. Remember that the media type pointer ( pmt ) can be NULL to clear the media type. We also copy the texture flag to another array. If the texture flag is FALSE , we create a private texture that we ll use for our texturing operations.

 STDMETHODIMP CMixer::SetStreamMediaType(DWORD dwStrmId,      AM_MEDIA_TYPE *pmt, BOOL fTexture)  {      CLock lock(&m_CritSec);      if (pmt == NULL)      {          // Free the media type in the array.          MyFreeMediaType(m_mt[dwStrmId]);      }      else      {          // Copy the media type and texture flag into the arrays.          m_fTexture[dwStrmId] = fTexture;          MyCopyMediaType(&m_mt[dwStrmId], *pmt);          // If the surface is not a texture, create a private texture.          if (!fTexture)          {              CreateVideoTexture(pmt);          }      }      return S_OK;  }

The MyCopyMediaType and MyFreeMediaType functions copy and free the media type structure. These functions are necessary to avoid memory leaks, because the AM_-MEDIA_TYPE structure holds a pointer to a secondary structure that is allocated separately. The functions ensure that this secondary structure is allocated and released.

The fTexture parameter specifies whether the decoder surfaces for a given input pin are textures. If the surfaces are not textures, we must create a private texture for that input pin. Setting up the private texture is handled by the CreateVideoTexture function.

 HRESULT CMixer::CreateVideoTexture(const AM_MEDIA_TYPE* pmt)  {      DWORD dwWidth, dwHeight;      GetVideoSurfaceDimensions(pmt, &dwWidth, &dwHeight);      // Round up for POW2 textures if needed.      AdjustTextureSize(&dwWidth, &dwHeight);      // If we already have a texture that's as big as we need, then we're done.      if ((dwWidth <= m_dwVideoTextureWidth) &&          (dwHeight <= m_dwVideoTextureHeight))      {          return S_OK;      }      // Otherwise, we need to (re)allocate the texture.      m_pVideoTexture.Release();      HRESULT hr = m_pDevice->CreateTexture(dwWidth, dwHeight, 1,          D3DUSAGE_RENDERTARGET, m_TexFormat, D3DPOOL_DEFAULT,          &m_pVideoTexture, NULL);      if (SUCCEEDED(hr))      {          m_dwVideoTextureWidth = dwWidth;          m_dwVideoTextureHeight = dwHeight;      }      return hr;  }

The GetVideoSurfaceDimensions function returns the video size , based on the media type. If the driver requires a square or power-of-2 texture, the surface dimensions are rounded up accordingly by calling the AdjustTextureSize function. This function is taken directly from the TeeVee sample in Chapter 10. The compositor uses one private texture for all video streams, so after the texture has been created, we do not re-allocate it unless the next stream is larger.

CompositeImage

This is a fairly long function. It can be broken down into the following steps:

Clear the background, using the background color that is specified by the application. As an optimization, you can skip this step if the bottom layer has an alpha value of 1.0 and covers the entire render target.
For every stream, calculate the source and target rectangles. These are determined by the native video size, the video aspect ratio, the size of the render target, and the normalized position of each video stream.
For each stream, get the texture from the video surface or copy the surface to the private texture. Create a set of vertices that define the four corners of the target rectangle, as shown in Figure 11.2. Using the source rectangle as the texture coordinates, place the application s alpha setting in the diffuse color component. Then draw the primitive.

Figure 11.2: Texture and vertex coordinates in the compositor.

For each pixel, the video surface (that is, the texture) provides the diffuse color. There are two possible sources for alpha. First, there is the alpha value for the entire stream, specified by the application. This information is contained in the VMR9VideoStreamInfo structure. When we composite the image, we ll store this value in the diffuse color component for the vertices. Second, the video itself may contain per-pixel alpha information. If so, the diffuse alpha must be modulated with the texture alpha. This situation is not very common because most video formats do not contain an alpha channel. Here is the implementation of this method.

 #define VERTEX_FORMAT (D3DFVF_XYZRHW  D3DFVF_DIFFUSE  D3DFVF_TEX1)  struct VERTEX  {      float x, y, z, rhw; // Transformed vertices.      DWORD color;        // Diffuse color (holds the alpha value).      float tu, tv;       // Texture coordinates for the video texture.  };  HRESULT CMixer::CompositeImage(IUnknown* pD3DDevice,      IDirect3DSurface9* pRenderTarget, AM_MEDIA_TYPE* pmtRenderTarget,      REFERENCE_TIME rtStart, REFERENCE_TIME rtEnd, D3DCOLOR dwClrBkGnd,      VMR9VideoStreamInfo*  pVideoStreamInfo, UINT cStreams)   {      CLock lock(&m_CritSec);      // Clear the render target surface. (You may be able to skip this call.)      m_pDevice->Clear(0, NULL, D3DCLEAR_TARGET, dwClrBkGnd, 0, 0);      // Set the necessary render states and the flexible vertex format.      SetRenderStates();      m_pDevice->SetFVF(VERTEX_FORMAT);      m_pDevice->BeginScene();      // Loop through all of the streams.      for (UINT iStream = 0; iStream < cStreams; iStream++)      {          const VMR9VideoStreamInfo *pStreamInfo = pVideoStreamInfo + iStream;          // Place the application's alpha setting into a D3D color value.          DWORD dwColor = D3DCOLOR_RGBA(0xFF, 0xFF, 0xFF,              (BYTE)(0xFF * pStreamInfo->fAlpha));          // Width and height of one texel (video pixel).          float fTU = 1.0f, fTV = 1.0f;          // Get the texture.          CComPtr<IDirect3DTexture9> pTex;          if (m_fTexture[iStream])          {              // This surface is a texture.              pStreamInfo->pddsVideoSurface->GetContainer(                  __uuidof(IDirect3DDevice9), (void**)&pTex);               // Calculate the texel dimensions.              fTU = 1.0f / pStreamInfo->dwWidth;              fTV = 1.0f / pStreamInfo->dwHeight;          }          else          {              // This surface is not a texture. Copy it to our private texture.              RECT rc = { 0, 0, pStreamInfo->dwWidth, pStreamInfo->dwHeight };              pTex = m_pVideoTexture;              CComPtr<IDirect3DSurface9> pSurf;              pTex->GetSurfaceLevel(0, &pSurf);              m_pDevice->StretchRect(pStreamInfo->pddsVideoSurface, &rc, pSurf,                  &rc, D3DTEXF_NONE);               // Calculate the texel dimensions.              fTU = 1.0f / m_dwVideoTextureWidth;              fTV = 1.0f / m_dwVideoTextureHeight;          }          // Find the source rectangle from the video media type.          RECT      rcSource;          const AM_MEDIA_TYPE *pmtSource = &m_mt[pStreamInfo->dwStrmID];          GetSourceRectangle(pmtSource, &rcSource);          // Find the target rectangle from the video media type, the          // render target media type, and the application settings.          FloatRect VertexRect;          FindDestVertices(pmtSource, pmtRenderTarget, pStreamInfo->rNormal,              &VertexRect);          // Now fill in the vertices.          VERTEX vertices[] =          {              // Upper-left corner.              {                  VertexRect.left, VertexRect.top, 0.5f, 1.0f,                  dwColor, fTU * rcSource.left, fTV * rcSource.top              },              // Upper-right corner.              {                  VertexRect.right, VertexRect.top, 0.5f, 1.0f,                  dwColor, fTU * rcSource.right, fTV * rcSource.top              },              // Lower-left corner.              {                  VertexRect.left, VertexRect.bottom, 0.5f, 1.0f,                  dwColor,fTU * rcSource.left, fTV * rcSource.bottom              },              // Lower-right corner.              {                  VertexRect.right, VertexRect.bottom, 0.5f, 1.0f,                  dwColor, fTU * rcSource.right, fTV * rcSource.bottom              }          };          WORD indices[] = { 0, 1, 3, 0, 3, 2 };          m_pDevice->SetTexture(0, pTex);          SetPerStreamRenderStates(pmtSource);          m_pDevice->DrawIndexedPrimitiveUP(D3DPT_TRIANGLELIST, 0, 4, 2,              (void*)indices, D3DFMT_INDEX16, (void*)vertices, sizeof(VERTEX));      }      m_pDevice->EndScene();      m_pDevice->SetTexture(0, NULL);      return S_OK;  }

We start by defining the flexible vertex format. We are using transformed vertex coordinates (D3DFVF_XYZRHW) because the destination rectangle is calculated in terms of screen coordinates. The diffuse color is a convenient place to store the stream alpha. (The RGB part of the diffuse color will not be used.) The texture coordinates define the source rectangle.

The CompositeImage function starts by clearing the render target, using the background color given in the dwClrBkGnd parameter. (You can optimize the function somewhat by testing whether the background color will be visible.) Next, the function loops through all of the streams in the pVideoStreamInfo array, and draws each frame according to the following algorithm:

For each frame, if the video surface is a texture, call IDirect3DSurface9::GetContainer to get the IDirect3DTexture9 pointer. Otherwise, the video surface must be copied to our private texture. (Now you know why we saved the fTexture flag for each stream in the SetStreamMediaType method.)
Set the fTU and fTV variables to the width and height of one texel. These values are used to scale the texture coordinates. For example, suppose the video frame covers the entire destination rectangle. If the texture surface and the video image are the same size, then the texture coordinates of the lower left corner are (1.0, 1.0). But suppose the texture is 512 — 256 pixels while the video image is only 320 — 240. In that case, the coordinates must be scaled to (320/512, 240/256), or (0.625, 0.9375). Otherwise, the unused area of the texture will be visible.
Calculate the source and target rectangles by calling GetSourceRectangle and FindDestVertices .
Build the destination rectangle from two triangles with four vertices.
Draw the rectangle. We use the UP ( user pointer ) version of DrawIndexedPrimitive because we re not storing the vertices in a vertex buffer. (There wouldn t be much point to doing so, because there are so few vertices and they change every frame.)

The SetRenderStates function sets the render states that remain constant for every video stream. We enable alpha blending but disable lighting. The diffuse color for each pixel is taken from the texture with no modulation. The texture-addressing mode in both directions is clamp ” this lets us place the video frame inside the destination rectangle without wrapping or mirroring the texture.

 void CMixer::SetRenderStates()  {      // Private method, caller should hold the lock.      // Enable alpha blending and disable lighting.      m_pDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);      m_pDevice->SetRenderState(D3DRS_LIGHTING, FALSE);      m_pDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_SRCALPHA);      m_pDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_INVSRCALPHA);      // Get the diffuse color from the texture.      m_pDevice->SetTextureStageState(0, D3DTSS_COLOROP, D3DTOP_SELECTARG1);      m_pDevice->SetTextureStageState(0, D3DTSS_COLORARG1, D3DTA_TEXTURE);      // Clamp the texture addresses.      m_pDevice->SetSamplerState(0, D3DSAMP_ADDRESSU, D3DTADDRESS_CLAMP);      m_pDevice->SetSamplerState(0, D3DSAMP_ADDRESSV, D3DTADDRESS_CLAMP);  }

The SetPerStreamRenderStates method sets the render states that may change for each stream, depending on whether the video format contains an alpha channel. You can check this by using the MEDIASUBTYPE_HASALPHA macro. If the video format has an alpha channel, we enable per-pixel alpha testing and modulate the texture alpha with the diffuse alpha. Otherwise, we disable per-pixel alpha testing and just select the diffuse alpha.

 void CMixer::SetPerStreamRenderStates(const AM_MEDIA_TYPE *pmt)  {      if (MEDIASUBTYPE_HASALPHA(*pmt))      {          // Modulate the diffuse alpha and the texture alpha.          m_pDevice->SetTextureStageState(0, D3DTSS_ALPHAOP, D3DTOP_MODULATE);          m_pDevice->SetTextureStageState(0, D3DTSS_ALPHAARG1, D3DTA_TEXTURE);          m_pDevice->SetTextureStageState(0, D3DTSS_ALPHAARG2, D3DTA_DIFFUSE);          // Enable per-pixel alpha testing.          m_pDevice->SetRenderState(D3DRS_ALPHATESTENABLE, TRUE);          // Optional: Ignore alpha below some minimum value.          m_pDevice->SetRenderState(D3DRS_ALPHAREF, 0x10);          m_pDevice->SetRenderState(D3DRS_ALPHAFUNC, D3DCMP_GREATER);      }      else      {          // Select the texture alpha. Disable per-pixel alpha testing.          m_pDevice->SetTextureStageState(0, D3DTSS_ALPHAOP, D3DTOP_SELECTARG1);          m_pDevice->SetTextureStageState(0, D3DTSS_ALPHAARG1, D3DTA_DIFFUSE);          m_pDevice->SetRenderState(D3DRS_ALPHATESTENABLE, FALSE);      }  }

TermCompositionDevice

Remember to release everything in the TermCompositionDevice method.

 STDMETHODIMP CMixer::TermCompositionDevice(IUnknown* pD3DDevice)  {      CLock lock(&m_CritSec);      m_pDevice.Release();      m_pVideoTexture.Release();      for (int i = 0; i < MAX_VMR_STREAMS; i++)      {          MyFreeMediaType(m_mt[i]);      }      return S_OK;  }

Keep in mind that your own compositor does not have to replicate the VMR s normal behavior. You can change or omit any of the steps that you don t need. This code is simply a baseline from which to work.