The Cacophony Project | Fundamentals of Audio and Video Programming for Games (Pro-Developer)

Using Microsoft Visual Studio .NET, load the Cacophony project and use the Solution Explorer window to list the source files. With this project, we ve used the programming concept of dividing it up into layers :

The UI layer (Cacophony.cpp) deals with the user interface and flow of control.
The second layer (DSound2DLayer.cpp) contains all the methods that use the CSound and CSoundManager objects created by the program.
The third layer (Extended_dsutil.cpp) defines the CSound and CSoundManager classes, and provides the extended framework that communicates with the DirectSound SDK.

If the objective was performance, this would be at least one layer too many. However, since the objective is education, use this structure to remove the need for many global variables , and to make the use of the objects more obvious. If you were to write this code simply for your own use, you would probably merge DSound2DLayer.cpp into Cacophony.cpp , and do away with many of the intermediary functions.

The following sections explain the main concepts and design of Cacophony.cpp . Rather than describe every line of code, you ll see that the discussion focuses on the lower levels, where the code modifies the framework and calls the DirectSound SDK. You should also find the code comments useful.

The Cacophony.cpp and Cacophony.h Files

To begin, first open or print out the Cacophony.cpp and Cacophony.h files. The main features of 2-D sound are that mono or stereo sounds can be played , and that parameters such as volume, panning position, and frequency can be altered from their original recording. The panning position is perhaps the most exciting, providing the best option for mimicking movement.

One obvious problem with panning a sound is that it could be incompatible with a sound recorded in stereo. Although DirectSound will not fail if you try to combine these features, the panning will tend to dominate the stereo recording. Given that the movement of sound (or rather, sound effects that appear to originate from moving objects) is the most important effect that you are trying to achieve, both in 2-D and 3-D, most sounds should be recorded or converted to mono “ preferably 16-bit mono.

In any event, the Cacophony tool handles both mono and stereo sounds. If you open the Cacophony.h file, and find the cSoundEffect class, you will see that the data members of this class match the UI.

 class cSoundEffect  {  private:      char       filename[MAX_PATH];      int        setting[max_settings];      int        status;      DWORD      tickLength;      DWORD      ticksElapsed;      int        iSound;      int        iBuffer;

For each sound entered in the UI, one object (in an array of 10 objects) of the cSoundEffect class is populated with data. The filename obviously contains the full path of the wave file. The settings array contains the various start and end values set using the Add or Edit Sound dialog box. Note there are nine settings, each located by a define, as shown in the following code.

 #define        timed_start           0  #define        start_time            1  #define        start_distance        2  #define        start_position        3  #define        start_frequency       4  #define        end_distance          5  #define        end_position          6  #define        end_frequency         7  #define        on_completion         8

The status variable contains one of the four status settings (dud, loaded, playing and stopped ).

The tickLength variable is the length of the sound sample in ticks , where each tick is one eighth of a second (therefore, a 2.5-second sound sample has a length of 20 ticks). The ticksElapsed variable contains a count of ticks since the sound started playing. These two variables are essential when calculating the volume, panning, or frequency change during the playing of the sound.

The iSound and iBuffer variables are indexes to the sound and the sound buffer respectively, and are used during both the analysis and playing stages. The sound index is an index into an array of cOneSound objects. A cOneSound object simply holds a repeat of the file name for convenience, the number of buffers required for it, and a Boolean flag indicating whether the wave file loaded correctly or not. The data members of this class are shown in the following code.

 class cOneSound  {  private:      char       filename[MAX_PATH];      int        nBuffers;      bool       loaded;

The reasoning behind having this class is that your cacophony may contain multiple instances of the same wave file, and you want to be able to combine them into one sound object with multiple buffers. This is a more efficient system than to have multiple sound objects, each containing one buffer with the identical sound loaded.

Both the cSoundEffect and cOneSound classes contain a range of methods to set and get the properties. The methods are so trivial that we defined them all in the header file, rather than give declarations in the header and definitions in a .cpp file. This follows a coding style similar to what is usually used with C# programming, and does save a lot of repetition of method prototypes . However it is just a style issue; defining and declaring the methods in separate files is much more common in C++ programming.

The only methods with any calculation going on concern the reading and writing of each sound to the cacophony file (the methods readEffect and writeEffect in the cSoundEffect class). The logic here is that sounds are only saved to the cacophony file if they are somewhere in the AVBook directory. This is simply so that the whole project can be moved, perhaps posted on a network for other developers, and have the cacophony files still work.

Now it s time to examine how these data classes are handled. When you open up the Cacophony.cpp file, you ll see that it is divided into two main sections, following the declarations and usual WinMain function. The first section deals with all the functions supporting the Add or Edit Sound dialog box, and the second part discusses those functions supporting the main Cacophony dialog box.

When the SoundDlgProc callback function is called, it means that the user has opened the Add or Edit Sound dialog box, where the user can change the settings for a sound. These settings are all recorded to a temporary set of variables, so the changes can be ignored if the user exits the dialog box without clicking OK. If the user does click OK, the cSoundEffect object, indexed by the g_cSound variable, is updated.

When the user clicks Play on the main dialog box, the initCacophony function calls the analyzeSoundEffects function to try and make sense of the data that was entered. Then, initCacophony starts the timing to set the cacophony in motion. We are not going to spend any more time on the UI code; you can easily step through what s going on with a debugger. The next section will focus on the functions in the DSound2DLayer.cpp file.

The DSound2DLayer.cpp and Extended_dsutil.cpp Files

When you open up the DSound2DLayer.cpp file, you ll see that it contains all the functions that manipulate the following two object types.

 CSoundManager* g_pSoundManager = NULL;  CSound*        g_pSound[max_sounds];

Note that these are just pointers; the objects are actually created in the functions initSound and loadSound respectively. Only one sound manager object is needed, but you need one sound object for every wave file that you intend to load. Remember, of course, that one sound object can contain multiple buffers of the same sound, so that even if the Cacophony dialog box is full, the number of sound objects required can be less than or equal to ten.

The functions initDirectSound, closeDirectSound and stopSound in this file are identical to those in the High5 sample explained in the previous chapter. The first new function to examine is the loadSound function.

 bool loadSound(int index, char filename[], int nBuffers)  {      if (soundWorking)      {          DWORD    bufferFlags = DSBCAPS_CTRLVOLUME  DSBCAPS_CTRLPAN                                  DSBCAPS_CTRLFREQUENCY;          // Delete any running sound.          stopSound(index);          // Free any previous sound, and make a new one.          SAFE_DELETE( g_pSound[index]  );          // Load the wave file into a DirectSound buffer.          if (FAILED(g_pSoundManager->             Create( &g_pSound[index] ,filename, bufferFlags,                     GUID_NULL, nBuffers )))              return false;          return true;        } else          return false;  }

The setting of the bufferFlags variable has been altered in this function (from the 0 used in the High5 sample) to enable volume, panning and frequency changes. Although the setting of the flags is straightforward, this is often forgotten in DirectSound programming, leading to hours of frustration. The only other change is that the g_pSound pointer requires an index, as it is now an array of ten pointers, rather than the single one used in the High5 sample.

The playSoundBuffer function is used to initiate the playing of one sound buffer. It takes as parameters an index into the g_pSound array ( iSound ) and an index into the array of buffers that the sound object might have ( iBuffer ), and the initial volume, panning position and frequency.

 bool playSoundBuffer(int iSound, int iBuffer, int nDistance, int nPan,          int nFreq)  {      DWORD    dwFlags = 0L;      long     actualVolume    = calcuateVolumeFromDistance( (float) nDistance);      long     actualPosition    = getRelativeValue( (long) nPan, 100L,                  (long) DSBPAN_LEFT, (long) DSBPAN_RIGHT);      DWORD    actualFrequency    = (DWORD) getRelativeValue( (long) nFreq, 100L,                  (long) 0, (long) g_pSound[iSound] -> GetRecordedFrequency() );      if (soundWorking AND g_pSound[iSound] != NULL)      {          if (FAILED(g_pSound[iSound] -> PlayBuffer( iBuffer, 0, dwFlags,                  actualVolume, actualPosition, actualFrequency )))              return false; else              return true;      } else          return false;  }

The first thing the function does is translate the volume into units understood by DirectSound. The DirectSound SDK has definitions for DSBVOLUME_MAX and DSBVOLUME_MIN , currently set at 0 and -10,000 respectively. The units are in 100ths of a decibel, so the maximum volume means that there is zero attenuation of the recorded sound, and the minimum volume means that there is -100dB attenuation (effectively silencing the sound). The utility function calculateVolumeFromDistance takes the distance factor and returns the appropriate attenuation. This function is defined in the extended_dsutil.cpp file.

 long calculateVolumeFromDistance(float distanceRatio)  {      // Distances should always be positive and greater than the original.      if (distanceRatio <= 1.0f)          return DSBVOLUME_MAX;      double hundredthsOfDeciBels = 100 * 20 * log10(distanceRatio);      return DSBVOLUME_MAX - (long) hundredthsOfDeciBels;  }

Although the UI only allows integer distance factors, clearly any float value is acceptable input (for example, a value of 2.5 means that the object sounds as if it is two and a half times further away from the listener than the original recording). The actual math used to calculate the attenuation in decibels is a well-known equation in the audio business:

Attenuation (dB) = 20 * log10 (distance ratio)

The underlying logic is that if an object doubles its distance from the listener, its sound volume will drop 6.02 dB. Obviously, because we are using 100th of a decibel units, we then multiply this value by 100 to get the answer.

In the playSoundBuffer function, the next calculation provides the correct panning position. The getRelativeValue function just returns the appropriate ratio when converting a number within one range of numbers into the correct number within a second range. DirectSound provides a range from DSBPAN_LEFT to DSBPAN_RIGHT , which are -10,000 to 10,000 “ which is more granularity than you will ever need. The Cacophony sample uses -50 to 50. The getRelativeValue function is used to convert the tool s units into DirectSound units.

The frequency calculation is very similar, except that you must first retrieve the actual recorded frequency, using the GetRecordedFrequency function. GetRecordedFrequency is a method that was added to the CSound class in the dsutil .cpp file provided with the DirectSound SDK, along with the PlayBuffer method. To view these methods, look up the Extended_dsutil.cpp file, which is provided in the Common sub-directory of the main AVBook directory.

The GetRecordedFrequency function is shown in the following code.

 DWORD CSound::GetRecordedFrequency ()  {      if( m_apDSBuffer == NULL )          return 0;      WAVEFORMATEX* pwfx = m_pWaveFile -> GetFormat();      return pwfx -> nSamplesPerSec;  }

First, note that frequencies are recorded in double-word ( DWORD ) variables. The CWaveFile class, also provided in the utility code, has the GetFormat method, which simply returns a pointer to the WAVEFORMATEX structure associated with the wave file. Using this pointer, you have access to the members of this structure, from which you can conveniently return the sampling rate (which is the frequency). The WAVEFORMATEX structure will be discussed in more detail later in this chapter.

The other addition to the CSound class, PlayBuffer , is one of the core functions for 2-D sound.

 HRESULT CSound::PlayBuffer(int iB, DWORD dwPriority, DWORD dwFlags,                             long Volume, long Position, DWORD Frequency)  {      HRESULT hr;      BOOL    bRestored;      if( m_apDSBuffer == NULL )          return CO_E_NOTINITIALIZED;      LPDIRECTSOUNDBUFFER pDSB = m_apDSBuffer[ iB ];      if( pDSB == NULL )          return DXTRACE_ERR( TEXT("PlayBuffer"), E_FAIL );      // Restore the buffer if it was lost.      if( FAILED( hr = RestoreBuffer( pDSB, &bRestored ) ) )          return DXTRACE_ERR( TEXT("RestoreBuffer"), hr );      if( bRestored )      {          // The buffer was restored, so you need to fill it with new data.          if( FAILED( hr = FillBufferWithSound( pDSB, FALSE ) ) )              return DXTRACE_ERR( TEXT("FillBufferWithSound"), hr );          // Make DirectSound do pre-processing on sound effects.          Reset();      }      hr = pDSB -> SetCurrentPosition (0);      hr = pDSB -> SetVolume(Volume);      hr = pDSB -> SetPan(Position);      if (Frequency != NO_FREQUENCY_CHANGE)          hr = pDSB -> SetFrequency(Frequency);      hr = pDSB -> Play( 0, dwPriority, dwFlags );      return hr;  }

The first portion of code concerns restoring buffers. For example, if another application running DirectSound has stolen your sound buffer, this code restores it.

The next section of code shows that the playing position is set to the start of the sound buffer, along with the required volume, panning position and frequency. Note that we provided a setting for the frequency parameter that requests that no change be made.

The dwPriority and dwFlags parameters are passed through to the Play method in the DirectSound SDK. The priority setting only applies if the sound buffer was created using the DSBCAPS_LOCDEFER flag; since that was not done, this parameter must be set to 0. See the DirectSound SDK documentation for more information about this flag. The only dwFlags setting that might apply is the looping flag, DSBPLAY_LOOPING .

Now, return to the DSound2DLayer.cpp file. Similar to the function that plays a single buffer, there is a function that tests whether a buffer has stopped playing or not (testSoundBufferStopped ). Again, this function calls through to a method, IsSoundBufferPlaying, in the CSound class.

 BOOL CSound::IsSoundBufferPlaying(int iB)  {      BOOL bIsPlaying = FALSE;      if( m_apDSBuffer == NULL )          return FALSE;      if( m_apDSBuffer[iB] )      {          DWORD dwStatus = 0;          m_apDSBuffer[iB] -> GetStatus( &dwStatus );          bIsPlaying = ( ( dwStatus & DSBSTATUS_PLAYING ) != 0 );      }      return bIsPlaying;  }

The DirectSound SDK provides the method GetStatus , which returns a DWORD with any of a number of bit flags set. The flag to test for here is DSBSTATUS_PLAYING , although the flag DSBSTATUS_LOOPING could also be set if this was a looping sound. The only other flag that could be set in this case is DSBSTATUS_BUFFERLOST . If the sound buffer was created with the DSBCAPS_LOCDEFER flag set (which was not done), then there are a few other flags that can be returned by GetStatus . Again, refer to the DirectSound SDK documentation for a discussion of these additional flags.

Back in the DSound2DLayer.cpp file, the next function to look at is getPlayingTimeInSeconds . In order to pan a sound evenly from its start to its end point, it is essential to know the length of the sound. This function calls through to the GetPlayingTime method that was added to the Extended_dsutil.cpp file.

 float CSound::GetPlayingTime ()  {      if( m_apDSBuffer == NULL )          return 0;      WAVEFORMATEX* pwfx = m_pWaveFile -> GetFormat();      return (float) m_dwDSBufferSize / (float) pwfx -> nAvgBytesPerSec;  }

Similar to the GetRecordedFrequency method, this method gets a pointer to the WAVEFORMATEX structure and retrieves the average number of bytes per second. The CSound object has already recorded the buffer size in bytes, so you simply divide this size by the bytes per second value and...presto! We have a reasonable estimate for the playing time of the file.

Back for the last time in DSound2DLayer.cpp , you ll see the three functions changeBufferVolume, changeBufferPanning and changeBufferFrequency . All of these functions copy a line from playSoundBuffer to calculate the actual values from the numbers given through the UI, and then they call methods that we added to the CSound class to set these values. The three methods added to the CSound class, setBufferVolume, setBufferPanPosition and setBufferFrequency , are very similar. As an example, the setBufferPanPosition method is shown in the following code.

 HRESULT CSound::setBufferPanPosition(int iB, long Position)  {      HRESULT hr;      if( m_apDSBuffer == NULL )          return CO_E_NOTINITIALIZED;      hr = m_apDSBuffer[ iB ] -> SetPan(Position);      return hr;  }

Not much here requires explanation, except perhaps the HRESULT that can contain an error if the call fails. If the call returns S_OK, then all is well, otherwise , there is a whole range of potential errors. The most likely error that you will receive is DSERR_CONTROLUNAVAIL , which means that you did not specify the right flags for the buffer. In the setBufferPanPosition method, if you did not specify the DSBCAPS_CTRLPAN flag when creating the buffer, you will get the DSERR_CONTROLUNAVAIL error, rather than the desired panning effect. The correct flags should be set in the loadSound function, which was described earlier in this chapter.

The WAVEFORMATEX Structure

For discussion purposes, the WAVEFORMATEX structure is a member of, and is filled in by, a CWaveFile object. This structure is part of the old but still thriving Microsoft Windows MultiMedia SDK, and is not declared in Dsound.h, but in mmreg.h. You may also have come across the WAVEFORMATEXTENSIBLE structure, used for multi channel sound, where the number of channels exceeds two, but even then, a WAVEFORMATEX structure forms the bulk of its data members. You may also come across WAVEFORMAT and PCMWAVEFORMAT in Microsoft header files “ you can safely ignore these two.

The WAVEFORMATEX structure holds the essential recoding data for the wave file.

 typedef struct {    WORD  wFormatTag;    WORD  nChannels;    DWORD nSamplesPerSec;    DWORD nAvgBytesPerSec;    WORD  nBlockAlign;    WORD  wBitsPerSample;    WORD  cbSize;  } WAVEFORMATEX;

The wFormatTag member can take a range of values defining the audio format, but DirectSound only supports one, WAVE_FORMAT_PCM . This means that the audio is recorded in pulse-code modulation (PCM) data in integer format. The similar WAVE_FORMAT_IEEE_FLOAT format “ pulse-code modulation in floating-point format “ is not supported by DirectSound.

Always the easiest member to describe, nChannels should be 1 for mono data and 2 for stereo. As you want to apply special effects to your sounds, usually this value is set to 1.

The nSamplesPerSec member holds the sample rate, often referred to as the frequency, stored in samples per second (hertz). The usual values for nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. Note that the storage unit is in hertz rather than kilohertz, so these values are stored as 8000, 11025, 22050 and 44100 respectively.

The nAvgBytesPerSec member is the required average data-transfer rate, in bytes per second. This should be equal to the product of nSamplesPerSec and nBlockAlign . At first, the definition of nBlockAlign sounds complicated, but is actually quite simple. It holds the block alignment, in bytes, and must be equal to the product of nChannels and wBitsPerSample divided by 8 (bits per byte). So, for example, if you have a mono wave file recorded in 16-bit data (the most common format), this translates to a block alignment of (1 x 16 / 8), which equals 2. All that this means is that programs reading the data must treat a two-byte block as an indivisible chunk “ obviously, because you would never want to divide one of the 16-bit audio samples in half. Perhaps the word block adds a bit of confusion here. Other examples are that 8-bit mono data will give a block alignment of 1, and 16-bit stereo data a block alignment of 4.

The wBitsPerSample member is either 8 or 16; no other value is supported by DirectSound. The cbSize member is ignored.

Extracting Useful Information from WAVEFORMATEX

In addition to the simple reading of the data members when they are required (for example, in the getRecordedFrequency method described earlier), the members can be combined to provide additional information. Note the following line from the GetPlayingTime method.

 return (float) m_dwDSBufferSize / (float) pwfx -> nAvgBytesPerSec;

In this statement, the m_dwDSBufferSizer data member is held in a CSound object, but it is set after the inherited CWaveFile object reads the wave file. The CWaveFile class has a GetSize method that returns the size in bytes of the file that was just read. The previous code only provides an estimate of the playing time of the file, although an estimate is good enough for most purposes.