Exploring YUVGray | Programming Microsoft DirectShow for Digital Video and Television (Pro-Developer)

Exploring YUVGray

When creating any DirectShow filter or any COM object that is intended to be made available across the entire operating system the filter or object must be given a unique identifier for both its class and interface IDs. (You need an interface ID only if you re exposing a custom interface to the operating system.) These values are formatted as GUIDs and can be generated by the uuidgen.exe utility program, which should be installed on your system. (Alternatively, you can use the Create GUID command on the Tools menu of Visual Studio .NET.) To create the GUID you ll need for the YUVGray filter, you enter the following command line:

uuidgen s >guids.txt

This command will generate a GUID and place it into the guids.txt file. This GUID can then be copied from guids.txt and pasted into your source code files as needed for class ID definitions. As you ll see shortly, this value will be the very first line in the YUVGray C++ file.

The YUVGray project contains two source files: YUVGray.cpp, the C++ source code; and YUVGray.def, the definitions file used to create a dynamic link library (DLL) from the compiled code. This DLL is then published to the operating system using the command-line program regsvr32.exe and becomes available to all DirectShow applications. The YUVGray filter needs to have its own class ID (GUID) defined so that it can be referenced by other COM applications, including DirectShow programs. Some of the first lines in YUVGray.cpp define the filter s class ID:

// Define the filter's CLSID // {A6512C9F-A47B-45ba-A054-0DB0D4BB87F7} static const GUID CLSID_YuvGray = { 0xa6512c9f, 0xa47b, 0x45ba, { 0xa0, 0x54, 0xd, 0xb0, 0xd4, 0xbb, 0x87, 0xf7 } };

This class ID definition could be included inside your own DirectShow applications, where you could use a function such as AddFilterByCLSID to add the YUVGray filter to your filter graph. There s one class method, CYuvGray::CreateInstance, and two functions, DllRegisterServer and DllUnregisterServer, that are required of any DLL. There are a few others, but their implementation is handled in the DirectShow base class library.

The CYuvGray::CreateInstance method is invoked by the DirectShow implementation of the class factory when the COM call CoCreateInstance is passed with the class ID of the YUVGray filter. There s nothing exciting about this method and these functions; have a peek at the source code if you want to understand how they re implemented. Finally, COM objects must support the IUnknown interface; this interface is supported automatically by the class library.

Creating the Class Definition for YUVGray

The class definition for YUVGray is generic enough that it could easily form the basis for your own transform filters. YUVGray is declared as a descendant of CTransformFilter, which means that the filter has separate input and output buffers. The filter must copy data from an input buffer to an output buffer as part of its normal operation. Here s the complete class definition for YUVGray:

class CYuvGray : public CTransformFilter { public: // Constructor CYuvGray(LPUNKNOWN pUnk, HRESULT *phr) : CTransformFilter(NAME("YUV Transform Filter"), pUnk, CLSID_YuvGray) {} // Overridden CTransformFilter methods HRESULT CheckInputType(const CMediaType *mtIn); HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut); HRESULT DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp); HRESULT GetMediaType(int iPosition, CMediaType *pMediaType); HRESULT Transform(IMediaSample *pIn, IMediaSample *pOut); // Override this so we can grab the video format HRESULT SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt); // Static object-creation method (for the class factory) static CUnknown * WINAPI CreateInstance(LPUNKNOWN pUnk, HRESULT *pHr); private: HRESULT ProcessFrameUYVY(BYTE *pbInput, BYTE *pbOutput, long *pcbByte); HRESULT ProcessFrameYUY2(BYTE *pbInput, BYTE *pbOutput, long *pcbByte); VIDEOINFOHEADER m_VihIn; // Holds the current video format (input) VIDEOINFOHEADER m_VihOut; // Holds the current video format (output) };

The class constructor CYuvGray::CYuvGray does nothing more than invoke the parent constructor for CTransformFilter, passing it the class ID for CYuvGray. Six methods, overridden from CTransformFilter, implement the guts of the YUVGray filter: CheckInputType, CheckTransform, DecideBufferSize, GetMediaType, Transform, and SetMediaType. The class definition also defines the CreateInstance method needed by the COM class factory and two private methods, ProcessFrameUYVY and ProcessFrameYUY2, which are invoked from within the Transform method and handle the actual bit-twiddling on each video frame.

Most of the implementation of CYuvGray is left in the hands of the parent class, CTransformFilter. The YUVGray filter doesn t depart too much from the standard model of how a transform filter behaves, so there isn t much to implement. The three major implementation areas are media type negotiation, buffer allocation, and data transformation.

Implementing Media Type Selection in a Transform Filter

Four methods must be implemented by a transform filter so that it can negotiate the media types it will receive on its input pin and provide on its output pin. If these methods are not implemented, the Filter Graph Manager will assume that the filter can receive any media type something that s not likely ever to be true! Here s the implementation of these media type selection methods:

HRESULT CYuvGray::CheckInputType(const CMediaType *pmt) { if (IsValidUYVY(pmt)) { return S_OK; } else { if (IsValidYUY2(pmt)) { return S_OK; } else { return VFW_E_TYPE_NOT_ACCEPTED; } } } HRESULT CYuvGray::CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut) { // Make sure the subtypes match if (mtIn->subtype != mtOut->subtype) { return VFW_E_TYPE_NOT_ACCEPTED; } if (!IsValidUYVY(mtOut)) { if (!IsValidYUY2(mtOut)) { return VFW_E_TYPE_NOT_ACCEPTED; } } BITMAPINFOHEADER *pBmi = HEADER(mtIn); BITMAPINFOHEADER *pBmi2 = HEADER(mtOut); if ((pBmi->biWidth <= pBmi2->biWidth) && (pBmi->biHeight == abs(pBmi2->biHeight))) { return S_OK; } return VFW_E_TYPE_NOT_ACCEPTED; } HRESULT CYuvGray::GetMediaType(int iPosition, CMediaType *pMediaType) { // The output pin calls this method only if the input pin is connected. ASSERT(m_pInput->IsConnected()); // There is only one output type that we want, which is the input type. if (iPosition < 0) { return E_INVALIDARG; } else if (iPosition == 0) { return m_pInput->ConnectionMediaType(pMediaType); } return VFW_S_NO_MORE_ITEMS; } HRESULT CYuvGray::SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt) { if (direction == PINDIR_INPUT) { ASSERT(pmt->formattype == FORMAT_VideoInfo); VIDEOINFOHEADER *pVih = (VIDEOINFOHEADER*)pmt->pbFormat; // WARNING! In general you cannot just copy a VIDEOINFOHEADER // struct, because the BITMAPINFOHEADER member may be followed by // random amounts of palette entries or color masks. (See VIDEOINFO // structure in the DShow SDK docs.) Here it's OK because we just // want the information that's in the VIDEOINFOHEADER struct itself. CopyMemory(&m_VihIn, pVih, sizeof(VIDEOINFOHEADER)); DbgLog((LOG_TRACE, 0, TEXT("CYuvGray: Input size: bmiWidth = %d, bmiHeight = %d, rcTarget width = %d"), m_VihIn.bmiHeader.biWidth, m_VihIn.bmiHeader.biHeight, m_VihIn.rcTarget.right)); } else // Output pin { ASSERT(direction == PINDIR_OUTPUT); ASSERT(pmt->formattype == FORMAT_VideoInfo); VIDEOINFOHEADER *pVih = (VIDEOINFOHEADER*)pmt->pbFormat; CopyMemory(&m_VihOut, pVih, sizeof(VIDEOINFOHEADER)); DbgLog((LOG_TRACE, 0, TEXT("CYuvGray: Output size: bmiWidth = %d, bmiHeight = %d, rcTarget width = %d"), m_VihOut.bmiHeader.biWidth, m_VihOut.bmiHeader.biHeight, m_VihOut.rcTarget.right)); } return S_OK; }

All these methods manipulate an object of CMediaType class, defined in the DirectShow base classes. This class is a wrapper around the AM_MEDIA_TYPE structure. The first method, CYuvGray::CheckInputType, is passed a pointer to a CMediaType object, which is then passed to two local comparison functions, IsValidUYVY and IsValidYUY2. Both functions examine the mediatype and subtype fields of the CMediaType object. If the mediatype field is MEDIATYPE_Video and the subtype field is either MEDIASUBTYPE_UYVY or MEDIASUBTYPE_YUY2, CYuvGray::CheckInputType returns S_OK to the caller, indicating that the media type and format block are acceptable and defined properly (and presumably, a pin-to-pin connection will be made to the filter). Any other media types will cause the method to return VFW_E_TYPE_NOT_ACCEPTED, indicating that the requested media type is not acceptable to the filter.

The CYuvGray::CheckTransform method ensures that the filter can handle any data format transformation that might be required between its input and output pins. A filter can and often does issue an output stream in a different format than it receives on its input pin, and any transformation between stream formats has to be handled entirely within the filter. The CYuvGray::CheckTransform method receives pointers to two CMediaType objects, representing the input and output media types, respectively. The YUVGray filter does not perform any format translation on the stream passing through it. Although the filter can receive a video stream in either UYVY or YUY2 format, the output format is the same as the input format. This method compares the input and output media types and ensures that they re identical and valid. If that s the case, S_OK is returned to the caller. Otherwise, VFW_E_TYPE_NOT_ACCEPTED is returned as an error code.

The CYuvGray::GetMediaType method is used during media type negotiation on an attempt to connect the output pin of the filter to another filter. In classes derived from CTransformFilter, the input pin doesn t suggest any types. It s assumed that the upstream output pin has more information about what kind of data is being delivered than a downstream input pin. However, the output pin will suggest a type only if its associated input pin is connected. Because YUVGray doesn t modify the output format of the stream, the input pin connection on a filter establishes the acceptable media type for the filter (the media type is delivered from an upstream output pin), and that value is returned to the caller.

CYuvGray::SetMediaType copies the contents of the format information pointed to by the pbFormat field of the passed CMediaType object. This field points to a data structure with specific information about the media samples, including, in this case, the width and height of the video image. This information will be used in the Transform method. Of the four media type methods overridden in CYuvGray, only SetMediaType is implemented in the parent class.. The other three methods (CheckInputType, CheckTransform, GetMediaType) are virtual and must be implemented in any class derived from CTransformFilter.

These four methods provide all the implementation details for media negotiation within the filter. Small modifications to CYuvGray::CheckInputType will increase (or decrease) the number of media types supported by the filter, which would be very easy to do with the addition of an IsValidXXXX (where XXXX is a format type) function to the source code module. You can find lots of information on the internals of video formats supported by DirectShow in the DirectShow SDK documentation and online at the MSDN Library at http: //msdn.microsoft.com/library.

Finally, a change to CYuvGray::CheckTransform will allow you to construct a filter that can perform a format translation internally, issuing a different format on its output pin than it was presented on its input pin. You ll need to code that format translation yourself, most likely within the CYuvGray::Transform method. The sample code useful for learning how to translate from one video format to another can be found in the MSDN Library pages.

Selecting Buffer Size in a Transform Filter

The CYuvGray::DecideBufferSize method is used during the output pin connection process. As explained earlier in this chapter, the output pin is responsible for negotiating the allocation of data stream buffers during the pin connection process, even if this allocation is actually done by the input pin of the downstream filter. In the case of a CTransformFilter object, this method is invoked with two parameters, a pointer to an IMemAllocator and a pointer to an ALLOCATOR_PROPERTIES structure which was delivered to the filter by the downstream pin and which contains the downstream filter s buffer requests (if there are any). That structure has the following definition:

typedef struct _AllocatorProperties { long cBuffers; long cbBuffer; long cbAlign; long cbPrefix; } ALLOCATOR_PROPERTIES;

The field cBuffers specifies the number of buffers to be created by the allocator, and cbBuffer specifies the size of each buffer in bytes. The cbAlign field specifies the byte alignment of each buffer, and cbPrefix allocates a prefix of a specific number of bytes before each buffer, which is useful for buffer header information. Here s the implementation of the method:

HRESULT CYuvGray::DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp) { // Make sure the input pin is connected. if (!m_pInput->IsConnected()) { return E_UNEXPECTED; } // Our strategy here is to use the upstream allocator as the guideline, // but also defer to the downstream filter's request // when it's compatible with us. // First, find the upstream allocator... ALLOCATOR_PROPERTIES InputProps; IMemAllocator *pAllocInput = 0; HRESULT hr = m_pInput->GetAllocator(&pAllocInput); if (FAILED(hr)) { return hr; } // ...now get the properties. hr = pAllocInput->GetProperties(&InputProps); pAllocInput->Release(); if (FAILED(hr)) { return hr; } // Buffer alignment should be non-zero [zero alignment makes no sense!]. if (pProp->cbAlign == 0) { pProp->cbAlign = 1; } // Number of buffers must be non-zero. if (pProp->cbBuffer == 0) { pProp->cBuffers = 1; } // For buffer size, find the maximum of the upstream size and // the downstream filter's request. pProp->cbBuffer = max(InputProps.cbBuffer, pProp->cbBuffer); // Now set the properties on the allocator that was given to us. ALLOCATOR_PROPERTIES Actual; hr = pAlloc->SetProperties(pProp, &Actual); if (FAILED(hr)) { return hr; } // Even if SetProperties succeeds, the actual properties might be // different than what we asked for. We check the result, but we look // at only the properties that we care about. The downstream filter // will look at them when NotifyAllocator is called. if (InputProps.cbBuffer > Actual.cbBuffer) { return E_FAIL; } return S_OK; }

Although a connection can be made on the output pin of a transform filter before a connection is made to that filter s input pin, a filter might decide to renegotiate a pin connection. Once an input connection is made, it could force a reconnection on the output pin. For example, the media type for the YUVGray filter isn t established until a connection is made on the filter s input pin, so the output pin can t request a specific media type when connecting to downstream filters. The default behavior of the CTransformFilter class is to reject all connections on the output pin until the input pin has been connected.

You can see this happen in GraphEdit, for example, if you connect a filter graph downstream from a transform filter and then connect to its input pin. On those occasions when the media type can t be renegotiated successfully, GraphEdit will report that the graph is invalid and break the connections downstream from the filter.

The output pin allocates buffers when the pin connects, which shouldn t be allowed if the input pin isn t connected, so the method first calls the CBasePin::IsConnected method on the filter s input pin. If that call fails, E_UNEXPECTED signals an unusual error situation. If an input pin is found, the method attempts to use the input pin s allocator properties as guidelines when calculating the requirements of the output pin, ensuring that the allocation is equal on both input and output sides of the filter. Next the method examines the allocator properties requested by the upstream filter s input pin and does its best to satisfy both its own needs, as defined by the allocation requirements of the filter s input pin and the requests of the downstream input pin. (The thorough approach is to take the larger of the two allocation calculations and use that as the safest value.)

Once the desired allocator properties have been chosen, the output pin attempts to set these properties on the allocator by calling IMemAllocator::SetProperties. The SetProperties method might succeed even if the allocator cannot match the exact properties that you request. The results are returned in another ALLOCATOR_PROPERTIES structure (the Actual parameter). The filter should always check whether these values meet the filter s minimum requirements. If not, the filter should return a failure code from DecideBufferSize.

The implementation of the DecideBufferSize method will vary depending on the kind of transform being performed within the transform filter. For example, a format translation from one format to another format could produce a much larger output stream than input stream (or vice versa), and that would have to be dealt with inside the implementation of the CYuvGray class because that s the only place where the specifics of the translation are known. Such a situation would be commonplace in a transform filter that acted as a data compressor or decompressor. The output pin has nearly complete control over allocation of its output data stream buffers, and DecideBufferSize is the method where the class should perform all calculations needed to assess the storage requirements for the output stream.

Implementing the Transform Method

The CYuvGray::Transform method is the core of the YUVGray transform filter. It s invoked each time the filter receives a sample from the upstream filter. The method receives two pointers, each of which is an IMediaSample interface. These two pointers correspond to buffers containing the input (source) sample, which should already be filled with good data, and the output (destination) sample, which needs to be filled with good data by this method. The time stamps for the input sample have already been copied to the output sample by the CTransformFilter implementation of the Receive method. Here s the implementation of CYuvGray::Transform:

HRESULT CYuvGray::Transform(IMediaSample *pSource, IMediaSample *pDest) { // Look for format changes from the video renderer. CMediaType *pmt = 0; if (S_OK == pDest->GetMediaType((AM_MEDIA_TYPE**)&pmt) && pmt) { DbgLog((LOG_TRACE, 0, TEXT("CYuvGray: Handling format change from the renderer..."))); // Notify our own output pin about the new type. m_pOutput->SetMediaType(pmt); DeleteMediaType(pmt); } // Get the addresses of the actual buffers. BYTE *pBufferIn, *pBufferOut; pSource->GetPointer(&pBufferIn); pDest->GetPointer(&pBufferOut); long cbByte = 0; // Process the buffers. // Do it slightly differently for different video formats. HRESULT hr; ASSERT(m_VihOut.bmiHeader.biCompression == FCC('UYVY') m_VihOut.bmiHeader.biCompression == FCC('YUY2')); if (m_VihOut.bmiHeader.biCompression == FCC('UYVY')) { hr = ProcessFrameUYVY(pBufferIn, pBufferOut, &cbByte); } else if (m_VihOut.bmiHeader.biCompression == FCC('YUY2')) { hr = ProcessFrameYUY2(pBufferIn, pBufferOut, &cbByte); } else { return E_UNEXPECTED; } // Set the size of the destination image. ASSERT(pDest->GetSize() >= cbByte); pDest->SetActualDataLength(cbByte); return hr; }

The method begins by looking at the media type for the downstream connection. The media type might have changed if the renderer downstream from this transform filter has requested a media type change, which can happen for two main reasons when the downstream filter is a video renderer. First, the legacy Video Renderer filter might switch between GDI and DirectDraw. Second, the Video Mixing Renderer (VMR) filter might update the format to reflect the surface stride required by the video driver. Before making a format change, the renderer calls IPin::QueryAccept to verify that the new format is OK. In the CTransformFilter class, this call is translated directly into a call to Check Transform, which gets handled the usual way. The renderer then indicates the format change by attaching the media type to the next sample. The media type is retrieved by calling GetMediaType on the sample and passed along to the output pin s SetMediaType method.

Next pointers to both the input and output stream buffers are retrieved and the media subtype is examined, indirectly, by looking at the biCompression field of the VIDEOINFOHEADER structure. The contents of this structure are copied to local storage for both the input and output pins in the CYuvGray::SetMediaType method. If the video subtype (expressed in FOURCC format, that is, 4 bytes, each representing a single alphanumeric ASCII character) is either UYVY or YUY2, the transform can be performed.

YUV Format

Most programmers are familiar with the RGB color space. An RGB pixel is encoded using values for red, green, and blue. YUV is an alternate way to represent video images. In YUV, the grayscale information is separate from the color information. For each pixel, the grayscale value (called luma, abbreviated Y) is calculated as a weighted average of the gamma-corrected red, green, and blue components. The most common formula in use today is as follows:

Y = 0.299R + 0.587G + 0.114B

The color information (chroma) is represented as two color difference components, U and V, that are calculated from B - Y (blue minus luma) and R - Y (red minus luma). Various scaling factors can be applied to the result, depending on the video standard in use. Technically, YUV refers to one particular scaling factor, used in composite NTSC and PAL video. However, YUV is commonly used as a blanket term for a family of related formats that includes YCbCr, YPbPr, and others.

The human visual system is more sensitive to changes in brightness than it is to changes in color. Therefore, chroma values can be sampled less frequently than luma values without significantly degrading the perceived quality of the image. All the common consumer DV formats use some form of chroma downsampling. Chroma downsampling is described using the a:b:c notation, as shown in the following table:

Sampling	Description
4:2:2	2:1 horizontal downsampling, no vertical downsampling. Each scan line has two U and V samples for every four Y samples.
4:1:1	4:1 horizontal downsampling, no vertical downsampling. Each scan line has one U and V sample for every four Y samples.
4:2:0	2:1 horizontal downsampling and 2:1 vertical downsampling.

In addition, 4:4:4 means that the chroma is not downsampled all the original U and V values are preserved. Some professional DV formats use 4:4:4 because there s no loss in color information. DVD video uses 4:2:0 sampling. Consumer DV formats are usually 4:2:0 or 4:1:1. By downsampling the chroma, the effective bit depth of the image is reduced. For example, 4:2:2 uses an average of 16 bits per pixel, rather than 24 bits per pixel for full 4:4:4 video.

UYVY and YUY2 are two different ways of representing a 4:2:2 YUV image in memory. The difference between these two formats is where they put their Y,U, and V values. In UYVY, the values are laid out as follows:

U0 Y0 V0 Y1 U2 Y2 V2 Y3 U4 Y4 V4 Y5

In YUY2, the values are laid out like this:

Y0 U0 Y1 V0 Y2 U2 Y3 V2 Y4 U4 Y5 V4

Each value is an 8-bit sample. As you can see, the same values are present; they re just arranged differently in memory.

The differences between UYVY and YUY2 format aren t enormous, but they re enough to guarantee that a piece of code that transforms one of them won t work at all on the other. That means that we need two entirely separate routines to handle the transform task, CYuvGray::ProcessFrameUYVY and CYuvGray::ProcessFrameYUY2. Here s the implementation of both methods:

HRESULT CYuvGray::ProcessFrameUYVY(BYTE *pbInput, BYTE *pbOutput, long *pcbByte) { DWORD dwWidth, dwHeight; // Width and height in pixels (input) DWORD dwWidthOut, dwHeightOut; // Width and height in pixels (output) LONG lStrideIn, lStrideOut; // Stride in bytes BYTE *pbSource, *pbTarget; // First byte first row, source & target *pcbByte = m_VihOut.bmiHeader.biSizeImage; GetVideoInfoParameters(&m_VihIn, pbInput, &dwWidth, &dwHeight, &lStrideIn, &pbSource, true); GetVideoInfoParameters(&m_VihOut, pbOutput, &dwWidthOut, &dwHeightOut, &lStrideOut, &pbTarget, true); // Formats should match (except maybe stride). ASSERT(dwWidth == dwWidthOut); ASSERT(abs(dwHeight) == abs(dwHeightOut)); // You could optimize this slightly by storing these values when the // media type is set, instead of recalculating them for each frame. for (DWORD y = 0; y < dwHeight; y++) { WORD *pwTarget = (WORD*)pbTarget; WORD *pwSource = (WORD*)pbSource; for (DWORD x = 0; x < dwWidth; x++) { // Each WORD is a 'UY' or 'VY' block. // Set the low byte (chroma) to 0x80 // and leave the high byte (luma). WORD pixel = pwSource[x] & 0xFF00; pixel = 0x0080; pwTarget[x] = pixel; } // Advance the stride on both buffers. pbTarget += lStrideOut; pbSource += lStrideIn; } return S_OK; } HRESULT CYuvGray::ProcessFrameYUY2(BYTE *pbInput, BYTE *pbOutput, long *pcbByte) { DWORD dwWidth, dwHeight; // Width and height in pixels (input) DWORD dwWidthOut, dwHeightOut; // Width and height in pixels (output) LONG lStrideIn, lStrideOut; // Stride in bytes BYTE *pbSource, *pbTarget; // First byte first row, source & target *pcbByte = m_VihOut.bmiHeader.biSizeImage; GetVideoInfoParameters(&m_VihIn, pbInput, &dwWidth, &dwHeight, &lStrideIn, &pbSource, true); GetVideoInfoParameters(&m_VihOut, pbOutput, &dwWidthOut, &dwHeightOut, &lStrideOut, &pbTarget, true); // Formats should match (except maybe stride). ASSERT(dwWidth == dwWidthOut); ASSERT(abs(dwHeight) == abs(dwHeightOut)); // You could optimize this slightly by storing these values when the // media type is set, instead of recalculating them for each frame. for (DWORD y = 0; y < dwHeight; y++) { WORD *pwTarget = (WORD*)pbTarget; WORD *pwSource = (WORD*)pbSource; for (DWORD x = 0; x < dwWidth; x++) { // Each WORD is a 'YU' or 'YV' block. // Set the high byte (chroma) to 0x80 // and leave the low byte (luma). WORD pixel = pwSource[x] & 0x00FF; pixel = 0x8000; pwTarget[x] = pixel; } // Advance the stride on both buffers. pbTarget += lStrideOut; pbSource += lStrideIn; } return S_OK; }

Each function breaks each row of video data into a series of 16-bit chunks. In UYVY format, the lower 8 bits of each chunk has the luma data, while in YUY2 format, the upper 8 bits of each chunk holds the luma. In both cases, luma value is preserved and the U and V values are set to 128, effectively removing all the color information from the YUV format. But before any video processing takes place, the method has to learn some specifics of the video sample formats for both input and output so that it can correctly process the buffers.

Before any processing takes place, two calls are made to the local GetVideoInfoParameters function.

void GetVideoInfoParameters( const VIDEOINFOHEADER *pvih, // Pointer to the format header BYTE * const pbData, // Pointer to first address in buffer DWORD *pdwWidth, // Returns the width in pixels DWORD *pdwHeight, // Returns the height in pixels LONG *plStrideInBytes, // Add this to a row to get the new row down BYTE **ppbTop, // Pointer to first byte in top row of pixels bool bYuv ) { LONG lStride; // For 'normal' formats, biWidth is in pixels. // Expand to bytes and round up to a multiple of 4. if (pvih->bmiHeader.biBitCount != 0 && 0 == (7 & pvih->bmiHeader.biBitCount)) { lStride = (pvih->bmiHeader.biWidth * (pvih->bmiHeader.biBitCount / 8) + 3) & ~3; } else // Otherwise, biWidth is in bytes. { lStride = pvih->bmiHeader.biWidth; } // If rcTarget is empty, use the whole image. if (IsRectEmpty(&pvih->rcTarget)) { *pdwWidth = (DWORD)pvih->bmiHeader.biWidth; *pdwHeight = (DWORD)(abs(pvih->bmiHeader.biHeight)); if (pvih->bmiHeader.biHeight < 0 bYuv) // Top-down bitmap { *plStrideInBytes = lStride; // Stride goes "down" *ppbTop = pbData; // Top row is first } else // Bottom-up bitmap { *plStrideInBytes = -lStride; // Stride goes "up" *ppbTop = pbData + lStride * (*pdwHeight - 1); // Bottom row is first } } else // rcTarget is NOT empty. Use a sub-rectangle in the image. { *pdwWidth = (DWORD)(pvih->rcTarget.right - pvih->rcTarget.left); *pdwHeight = (DWORD)(pvih->rcTarget.bottom - pvih->rcTarget.top); if (pvih->bmiHeader.biHeight < 0 bYuv) // Top-down bitmap { // Same stride as above, but first pixel is modified down // and over by the target rectangle. *plStrideInBytes = lStride; *ppbTop = pbData + lStride * pvih->rcTarget.top + (pvih->bmiHeader.biBitCount * pvih->rcTarget.left) / 8; } else // Bottom-up bitmap { *plStrideInBytes = -lStride; *ppbTop = pbData + lStride * (pvih->bmiHeader.biHeight - pvih->rcTarget.top - 1) + (pvih->bmiHeader.biBitCount * pvih->rcTarget.left) / 8; } } }

GetVideoInfoParameters returns the width and height of the video image (our sample filter is not prepared to deal with different values for the input and output samples), returns a pointer to the first pixel (upper-left corner) in the video image, and calculates the number of bytes the stride (sometimes called pitch in the Direct3D documentation) in each row of the image. It s important to understand that stride is a different value than image width. Although the image could be stored in 720 bytes per row, the stride for a row might be a figure in the range of 1024 bytes per row. So, simply incrementing a pointer by the image width value will not point you to the next row of image data. You need to use the stride value as your index increment. The stride information is used by both the transform routines to ensure that they stay within the bound of the video image and to ensure that they process only video data, leaving alone any other data padding that might be associated with how the video image is stored in computer memory. (For example, RGB video is always aligned on 32-bit boundaries in memory.)

In addition, GetVideoInfoParameters examines the format of the video image to determine if the image is expressed in top-down or bottom-up format. If the image is presented in top-down format, the first byte of video is the upper-left corner of the video image; if it s in bottom-up format, the first byte of video is the lower-left corner. The value returned for stride reflects the orientation, and given a pointer to a row of pixels, it allows you to write consistent loops, regardless of image orientation. Although this information wouldn t make a big difference for the YUV filter, if the transform operation involved a manipulation of pixel positions, the information would be absolutely essential. In your own video transform filters, you ll likely want to use the GetVideoInfoParameters function or something very much like it during the transform operation.

Top-Down vs. Bottom-Up Images

If you re new to graphics programming, you might expect that a bitmap would be arranged in memory so that the top row of the image appeared at the start of the buffer, followed by the next row, and so forth. However, this is not necessarily the case. In Windows, device-independent bitmaps (DIBs) can be placed in memory in two different orientations, bottom-up and top-down.

In a bottom-up DIB, the image buffer starts with the bottom row of pixels, followed by the next row up, and so forth. The top row of the image is the last row in the buffer. Therefore, the first byte in memory is the lower-left pixel of the image. In GDI, all DIBs are bottom-up. Figure 10-2 shows the physical layout of a bottom-up DIB.

figure 10-2 the physical layout of a bottom-up dib

Figure 10-2. The physical layout of a bottom-up DIB

In a top-down DIB, the order of the rows is reversed. The top row of the image is the first row in memory, followed by the next row down. The bottom row of the image is the last row in the buffer. With a top-down DIB, the first byte in memory is the upper-left pixel of the image. DirectDraw uses top-down DIBs. Figure 10-3 shows the physical layout of a top-down DIB.

figure 10-3 the physical layout of a top-down dib

Figure 10-3. The physical layout of a top-down DIB

For RGB DIBs, the image orientation is indicated by the biHeight member of the BITMAPINFOHEADER structure. If biHeight is positive, the image is bottom-up. If biHeight is negative, the image is top-down.

DIBs in YUV formats are always top-down, and the sign of the biHeight member is ignored. Decoders should offer YUV formats with positive biHeight, but they should also accept YUV formats with negative biHeight and ignore the sign.

Also, any DIB type that uses a FOURCC value in the biCompression member should express its biHeight as a positive number no matter what its orientation is because the FOURCC format itself identifies a compression scheme whose image orientation should be understood by any compatible filter.

With the implementation of the CYuvGray::Transform method complete, the filter is now completely implemented. However, a few definitions still need to be declared for COM and DirectShow.

AMOVIESETUP_FILTER FilterInfo = { &CLSID_YuvGray, // CLSID g_wszName, // Name MERIT_DO_NOT_USE, // Merit 0, // Number of AMOVIESETUP_PIN structs NULL // Pin registration information }; CFactoryTemplate g_Templates[1] = { { g_wszName, // Name &CLSID_YuvGray, // CLSID CYuvGray::CreateInstance, // Method to create an instance of MyComponent NULL, // Initialization function &FilterInfo // Set-up information (for filters) } };

These two data structures are necessary to make the filter a COM-accessible entity. The AMOVIESETUP_FILTER data structure defines the information that gets written to the computer s registry file when the DLL is registered. We don t want this filter to be used by Intelligent Connect; the MERIT_DO_NOT_USE value in AMOVIESETUP_FILTER means that Intelligent Connect won t attempt to use this filter as an intermediate connection between two other filters. Finally, the CFactory Template data structure holds information that the DirectShow class factory uses when satisfying CoCreateInstance requests.

We also have a definitions file, YUVGray.def, which defines the published interfaces of the DLL.

LIBRARY YuvGray.DLL DESCRIPTION 'YUV Gray Transform' EXPORTS DllGetClassObject PRIVATE DllCanUnloadNow PRIVATE DllRegisterServer PRIVATE DllUnregisterServer PRIVATE

You might have noticed that the DllGetClassObject and DllCanUnloadNow methods aren t implemented in YUVGray.cpp. They re implemented in the base class library, so we don t have to do it ourselves.

Using the YUVGray Filter

Once you ve compiled YUVGray.cpp and installed the DLL using regsvr32.exe, you should be able to find it in the enumerated list of DirectShow filters in GraphEdit, as shown in Figure 10-4.

figure 10-4 the yuv filter visible in the list of directshow filters

Figure 10-4. The YUV Filter visible in the list of DirectShow filters

On my own system, I built a small filter graph (shown in Figure 10-5) taking the output from my webcam which produces a UYVY format video stream through the YUV Filter and into the video renderer, so I get black and white output from a webcam. That might seem a bit bizarre given that an average webcam is already a low-resolution device, but it actually produces an image with the quality of an ancient television broadcast.

figure 10-5 three-element filter graph that uses the yuv filter to produce black and white output from my webcam

Figure 10-5. Three-element filter graph that uses the YUV filter to produce black and white output from my webcam

If you want to test YUY2 formats, try rendering an MPEG-1 file through your filter graph. The DirectShow MPEG-1 decoder prefers YUY2 format to UYVY format, and the YUV Filter will process a YUY2 format video stream if it s downstream from the MPEG-1 decoder.