The Tao of Segment | DirectX 9 Audio Exposed: Interactive Audio Development

Every sound effect or phrase of music that DirectX Audio plays starts as a Segment. So far, we've only played Segments one at a time. But that won't do in a complex application environment. In interactive media, things happen spontaneously, so you can't have a prerecorded (read: canned) audio or music score. The soundtrack must be assembled in real time as the experience takes place. The obvious first step toward such a solution is to have many little sound and music snippets and trigger them as the user goes along. Indeed, that is why Segments are called "Segments"; the intention is each represents just a snippet, or Segment, of the entire sound effect or music score.

click to expand
Figure 10-1: Multiple overlapping Segments combine to create the complete score.

But, that's not enough. In any interactive sound scenario, there are requirements for reactivity, context, variability, and structure for both music and sound effects. Reactivity means that the sounds or music must be capable of responding to immediate actions and changes in the story. Context means that new sounds or musical phrases take into consideration the rest of the audio score, so they reinforce each other, rather than clash. Variability means that sounds and music don't sound exactly the same every time, which would quickly become unnatural. Structure means that sounds and especially music have different levels of inherent timelines.

Let's look at sound effects first. By definition, sound effects are reactive, driven by the unfolding story. When something happens, be it a car crashing or a monster belching, the game immediately plays the appropriate sound. But even then, greater sophistication is highly desirable. The human ear is remarkably quick to identify a perfectly repeated sound, and after the fourth belch it starts to sound as if the monster swallowed a tape recorder instead of a deliciously seasoned and saut ed princess. So, variation in the sound is important. Context is important too. If the monster is sad, it would help to reinforce the emotion in the sound, perhaps by bending the belch down in pitch.

Music is significantly more challenging. In order to be effective at reinforcing the user's experience, a nonlinear score should track the flow of the media that it supports. Yet, music has multiple layers of structure, from song arrangement at the highest level to phrasing, harmony, and melodic form. Randomly playing notes in response to events in the media doesn't cut it. But, neither does it work to play a lengthy prerecorded score that seems to absentmindedly wander outside of an interactive application with a storyline of its own, often running at cross-purposes to the story. It's as if the score was composed by Mr. Magoo. Remember the cartoon character that could barely see and would "blindly" misinterpret situations and forge ahead on the wrong assumption to comic effect? Do you really want your score composed by that guy?

Unfortunately, that is what a large percentage of scores for games and other interactive applications seem like. Not surprisingly, gamers typically turn the music off after a number of runs. Apparently, the comedy of the ill-fitting score is lost on them!

(Pardon me for editorializing a bit here…)

What does this all mean? The Segment architecture must be relatively sophisticated in order to support not just playing sounds and music snippets in reaction to events in the application (i.e., gameplay) but also to play them in such a way that they still seem to have structure, with variation and respect for context. While we are at it, let's add another requirement. The Segment mechanism must support different sound- and music-producing technologies, from MIDI and DLS to wave playback, from style-based playback to straight sequencing.

The DirectX Audio Segment architecture was designed to solve all of these requirements. The Segment, which is manipulated via the IDirectMusicSegment8 interface, is capable of handling a wide range of sound-producing technologies, from individual waves to Style playback. Multiple Segments can be scheduled for playback in ways that allow their sounds and music to merge, overlap, align, and even control each other.

Anatomy of a Segment

A Segment is little more than a simple structure that defines a set of playback parameters, such as duration and loop points for the music or sounds that it will play. But a Segment can contain any number of Tracks, each representing a different technology. By technology, I mean Tracks that play Styles, Tracks that set the tempo, and much more. It is choice of Track types and data placed within them that define a Segment.

If you are familiar with creating Segments in DirectMusic Producer, you know that a Segment is constructed by choosing from a wide palette of Tracks and adding them to the Segment. You know that some Track types generate sound (Sequence, Style, Wave, etc.), while others silently provide control information (Chord, Parameter, Time Signature, etc.), and yet others actively control playback (Segment Trigger, Script). There are currently about a dozen Track types included with DirectMusic.

I like to think of a Segment as a little box that manages a list of Tracks, each represented by a long box indicating its list of actions over time.

The Segment in Figure 10-2 has a Wave Track that triggers one or more waves to play across the duration of the Segment. It also has a Time Signature track, which provides timing information. It has a Band Track, which determines which DLS instruments to play and their initial volume and pan, and it has a Sequence Track, which plays a sequence of MIDI notes. So, this Segment plays a sequence of waves as well as MIDI notes, and it provides time signature information that the Performance can use to control playback of this Segment. In a while, we learn that this can also control the playback of other Segments.

click to expand
Figure 10-2: Segment with four Tracks.

You can see how a Segment is ultimately defined by the collection of Tracks that it contains, since they completely define its behavior.

Each Track in a Segment is actually a COM object, represented by the IDirectMusicTrack8 and IPersistStream interfaces. The IPersistStream interface is used for loading the Track, while the IDirectMusicTrack8 interface is used for performing it. In a later chapter, we discuss how to create your own Track type.

How a Segment Loads

Tracks are created at the time a Segment is loaded from file. The Segment file format stores each Track as a class ID for the Track type followed by the data. When the Segment is loaded, it reads its small set of data and then reads the list of Tracks, each represented by a class ID and associated data chunk. For each Track in the file, it calls CoCreateInstance() to create the Track object. It then calls the Track's IPersistStream::Load() method and passes it the Track's data chunk in the form of the IStream at the current file position (see Chapter 9 to better understand file I/O and the Loader). The Track reads its data and passes control back to the Segment, which continues to the next Track, assembling a list of created and loaded Tracks as it goes.

Note

The Track types do not show up as cached object types in the Loader. Because they are always embedded within the Segment, there is no need for the Loader's linking and caching mechanisms.

How a Segment Plays

Since playback of a Segment is really playback of the Tracks within the Segment, we look to the Tracks themselves to understand how a Segment performs. There are two ways that Tracks perform: by generating playback events (mostly MIDI messages) for the DirectMusic synth and by delivering control parameters that are read by other Tracks. Examples of message-generating Tracks include Sequence, Style, Band, Wave, and Pattern. Examples of control-generating tracks include Groove Level, Chord/Key, Mute, Tempo, and Time Signature.

Performance Channels

At this point, it's probably a good idea to introduce the concept of Performance channels, or pchannels. Since everything that the Performance plays goes through a MIDI-style synthesizer, a lot of MIDI conventions are used. One is the concept of a MIDI channel.

In the MIDI world, each synthesizer can have up to 16 channels. Each channel represents a musical instrument. Up to 16 different instruments can be played at the same time by assigning each to one of the 16 channels. Every performance MIDI message includes the address of the channel. So, the program change MIDI message tells the channel which instrument to play. The MIDI Note On and Note Off messages trigger playing notes using that instrument. More than one note can play at a time on one channel. Additional MIDI messages adjust the volume, pan, pitch bend, and many other parameters for all notes playing on the addressed channel.

There are some powerful things that you can do with this. Because each message carries its channel address, you can inject messages into the stream that modify the performance. For example, you can adjust the volume or reverb of a MIDI performance by sending down the appropriate MIDI control change messages without touching the notes themselves.

This is all marvelous, but 16 MIDI channels is not enough. There's an historical reason that MIDI is limited to 16 channels. When the MIDI specification was created back around 1982, it was purely a hardware protocol designed to connect electronic musical instruments together via a standard cable. To maximize bandwidth, the number of channels was limited to four bits, or 16. At the time, it was rare to find a musical device that could play more than one instrument at a time, let alone 16, so it was reasonable. Twenty years later, a 16-instrument limit is horribly inadequate.

DirectMusic's Core Layer addresses this by introducing the concept of channel groups. Each channel group is equivalent to one set of 16 MIDI channels. In turn, the Performance supports unlimited channels, called pchannels, and manages the mapping of these from an AudioPath pchannel (each AudioPath has its own virtual channel space) to a unique channel group + MIDI channel on the synth. What this means is every Track in every Segment that plays on the same AudioPath is sending commands to the synth on the same shared set of pchannels. So, if a Band Track in a Segment sets piano on pchannel 2, a separate Sequence Track with notes on pchannel 2 will play piano. Likewise, one Segment with a Band Track can control the instruments of a completely separate Segment on the same AudioPath (as long as they both play on the same AudioPath, of course).

In DirectMusic, everything that can make a sound is addressed via pchannels. This means waves as well as the standard MIDI notes. In all cases, the sound-generating Tracks create messages (called pmsgs), which are time stamped and routed via pchannels, ultimately to the synth where they make sound.

The good news is that all this is really set up in the content by the audio designer working with DirectMusic Producer. There's really nothing the programmer needs to do, but it's good to understand what is going on.

Inter-Track Communication

While some Track types generate messages on pchannels to send to the synth, others provide control information. Examples of control information are time signatures, which are used to identify the next beat and measure. Groove level is another example, which controls the intensity level of Style playback Tracks.

As it turns out, control information is always provided by one track to be used by another. Groove Level, Chord, and Mute Tracks, for example, are used to determine the playback behavior of a Style Track. At first glance, it might make sense to just put the groove, chord, and mute information directly in the Style Track to keep things simple. But, it turns out that by sharing this information with multiple customer Tracks, some really powerful things can be accomplished. For example, multiple Segments can play at the same time and follow the same chord progression. More on how this works in a bit…

So, we have a mechanism for cross-track communication. It's done via an API for requesting control information, and it's called GetParam(). GetParam() methods are provided on the IDirectMusicPerformance, IDirectMusicSegment, and IDirectMusicTrack interfaces. When a Track needs to know a particular parameter, it calls GetParam() on the Performance, which in turn identifies the appropriate Segment and calls GetParam() on it. That, in turn, calls GetParam() on the appropriate Track. Each command request is identified by a unique GUID and associated data structure in which the control data is returned.

The beauty of the GetParam() mechanism is that it works not only across Tracks within one Segment but also across Tracks in multiple Segments. This is very important because it allows control to be shared in sophisticated ways that make the playing of multiple Segments at once seem more like a carefully orchestrated performance. For example, a Segment with a Pattern Track that is designed to transpose on top of the current chord and key is played as a musical embellishment. It uses chord and key information generated by a Chord Track in a separate Style playback Segment. So, the embellishment seems to lock into the harmonic structure of the Style Segment, and it all sounds as if it were carefully written to work together.

Primary, Secondary, and Controlling Segments

So, how does the performance determine which Segment to read the control information from? As it turns out, there are three levels of playback priority for a Segment, set at the time the Segment is played. They are primary, secondary, and controlling.

Primary Segment

A primary Segment is intended primarily for music. Only one primary Segment may be playing at a time, and it is considered the core of the music performance. In addition to the music that it plays, the primary Segment provides the underlying Tempo, Time Signature, and, if applicable, Chord and Groove Level Tracks. By doing so, it establishes these parameters for all music in all Segments. So, any Playback Tracks within that same Segment retrieve their control information from the neighboring Control Tracks as they play. No two primary Segments can play at the same time because that would cause the Control Tracks to conflict. If a primary Segment is playing and a new one is scheduled to play, the currently playing Segment stops exactly at the moment that the new Segment starts.

Controlling Segment

A controlling Segment has the unique ability to override the primary Segment's Control Tracks without interrupting its playback. For example, a controlling Segment with an alternate chord progression can be played over a primary Segment and all music shifts to the replacement key and chord until the controlling Segment is stopped. So, suddenly the tonal character of the music can make a shift in an emotionally different direction. Figure 10-3 demonstrates how a controlling Segment with an alternate chord progression overrides the chords in two subsequent primary Segments.

click to expand
Figure 10-3: Underlying chord progression assembled from primary and controlling Segments.

Likewise, Tempo, Muting, and Groove Level can be overridden. This is a very powerful way to alter the music in real time, perhaps in response to stimuli from the host application (i.e., in response to gameplay). Like the primary Segment, the controlling Segment is primarily for use with music, not sound effects.

Secondary Segment

A secondary Segment also has sound-producing Tracks, but it is devoid of Control Tracks. Importantly, any number of secondary Segments can be played at the same time. This is important for playing sound effects as well as music. With sound effects, the usage is simple. Each secondary Segment contributes additional sounds on top of the others playing in the same AudioPath. With music, it can be a little more sophisticated. Because the secondary Segments retrieve their control information from the primary and controlling Segments, they automatically track any of the musical parameters that are defined by the controlling Tracks. This is very powerful. In particular, the fact that they can track the chord and time signature means that musical embellishments can play spontaneously and yet blend in rhythmically and harmonically.

There is the special case of a self-controlling Segment. It is possible to create a Segment in DirectMusic Producer that will only listen to its own Control Tracks, regardless of whether it is played as a primary, secondary, or controlling Segment. This is useful if you want to create a Segment that genuinely plays on its own and ignores the Control Tracks of other Segments.

Whew! Enough talk — let's do something!