MIDI | Killer Game Programming in Java

Sampled Audio

Sampled audio is a series of digital samples extracted from analog signals, as illustrated by Figure 7-3. Each sample represents the amplitude (loudness) of the signal at a given moment.

The quality of the digital result depends on two factors: time resolution (the sampling rate), measured in Hertz (Hz), and amplitude resolution (quantization), the number of bits representing each sample. For example, a CD track is typically sampled at 44.1 kHz (44,100 samples per second), and each sample uses 16 bits to encode a possible 65,536 amplitudes.

Descriptions of sampled audio often talk about frames (e.g., frame size, frame rate). For most audio formats, a frame is the number of bytes required to represent a single

Figure 7-3. From analog to digital audio

sample. For example, a sample in 8-bit mono pulse code modulation (PCM) format requires one frame (one byte) per sample. 16-bit mono PCM samples require two frames, and 16-bit stereo PCM needs four frames: 2 bytes each for the left and right 16-bit samples in the stereo.

As the sample rate and quantization increase, so do the memory requirements. For instance, a three-second stereo CD track, using 16-bit PCM, requires 44,100x4x3 bytes of space, or 517 KB. The "4" in the calculation reflects the need for four frames to store each stereo 16-bit sample.

The higher the sample rate and quantization, the better the sound quality when the digital stream is converted back to an analog signal suitable for speakers or headphones. Figure 7-4 shows that the smoothness and detail of the signal depends on the number of samples and their amplitude accuracy.

Figure 7-4. From digital to analog audio

Figure 7-5 shows the conversion of a digital stream for the same sine wave but encoded at a higher sample rate. The resulting audio is closer to the original than the one shown in Figure 7-4.

Sampled audio can be encoded with the Clip or SourceDataLine classes.

A Clip object holds sampled audio small enough to be loaded completely into memory during execution; therefore, a Clip is similar to AudioClip.

Figure 7-5. Conversion of a digital stream with a higher sample rate

"Small enough" usually means less than 2 MB.

A SourceDataLine is a buffered stream that permits chunks of the audio to be delivered to the mixer in stages over time without requiring the entire thing to be in memory at once. The buffered streaming in SourceDataLine shouldn't be confused with the video and audio streaming offered by JMF. The difference is that JMF supports time-based protocols, such as RTP, which permits the audio software and hardware to manage the network latency and bandwidth issues when data chunks are transferred to it over a network. I'll say a little more about JMF at the end of this chapter.

Streaming in Java Sound does not have timing capabilities, making it difficult to maintain a constant flow of data through a SourceDataLine if the data are coming from the network; clicks and hisses can be heard as the system plays the sound. However, if SourceDataLine obtains its data from a local file, such problems are unlikely to occur.

The Mixer

Clip and SourceDataLine are subclasses of the Line class; lines are the piping that allows digital audio to be moved around the audio system, for instance, from a microphone to the mixer and from the mixer to the speakers (see Figure 7-6).

Figure 7-6 is a stylized view of a mixer, intended to help explain the various classes and coding techniques for sampled audio.

Inputs to a mixer may include data read as a Clip object or streamed in from a device or the network, or generated by a program. Output can include audio written to a file, sent to a device, transmitted over the network, or sent as streamed output to a program.

The mixer, represented by the Mixer class, may be a hardware audio device (e.g., the sound card) or software interfaced to the sound card. A mixer can accept audio

Figure 7-6. Audio I/O to/from the mixer

streams coming from several source lines and pass them onto target lines, perhaps mixing the audio streams together in the process and applying audio effects like volume adjustment or panning.

The capabilities of Java Sound's default mixer have changed in the transition from J2SE 1.4.2 to J2SE 5.0. In J2SE 1.4.2 or earlier, the default mixer was the Java Sound Audio Engine, which had playback capabilities but could not capture sound; that was handled by another mixer. In J2SE 5.0, the Direct Audio Device is the default and supports playback and recording.

Clip, SourceDataLine, and TargetDataLine are part of the Line class hierarchy shown in Figure 7-7.

Figure 7-7. Part of the Line hierarchy

DataLine adds media features to Line, including the ability to determine the current read/write position, to start/stop/pause/resume the sound, and to retrieve status details.

The SourceDataLine adds methods for buffering data for playback by the mixer. The name of the class is a little confusing: "source" refers to a source of data for the mixer. From the programmer's point of view, data is written out to a SourceDataLine to send it a mixer.

The TargetDataLine is a streaming line in the same way as SourceDataLine. "Target" refers to the destination of the data sent out by the mixer. For instance, an application might use a TargetDataLine to receive captured data gathered by the mixer from a microphone or CD drive. A TargetDataLine is a source of audio for the application.

A Clip is preloaded rather than streamed, so its duration is known before playback. This permits it to offer methods for adjusting the starting position and looping.

A LineListener can be attached to any line to monitor LineEvents, which are issued when the audio is opened, closed, started, or stopped. The "stopped" event can be utilized by application code to react to a sound's termination.

Figure 7-6 shows that lines are linked to the mixer through ports. A Port object typically allows access to sound card features dealing with I/O. For example, an input port may be able to access the analog-to-digital converter. An output port may permit access to the digital-to-analog converter used by the speakers or headphones. A change to a port will affect all the lines connected to it. The Port class was not implemented prior to J2SE 5.0.

The box marked "Controls" inside the mixer in Figure 7-6 allows audio effects to be applied to incoming clips or SourceDataLines. The effects may include volume control, panning between speakers, muting, and sample rate control, though the exact selection depends on the mixer. Chapter 9 has an example where mixer controls are applied to a clip.

Another form of audio manipulation is to modify the sample data before it is passed through a SourceDataLine to the mixer. For example, volume control is a matter of amplitude adjustment, coded by bit manipulation. Chapter 9 has a volume control example.

Playing a Clip

PlayClip.java (in SoundExamps/SoundPlayer/) loads an audio file specified on the command line as a clip and plays it once:

The main( ) method creates a PlayClip object and exits afterward.

     public static void main(String[] args)     { if (args.length != 1) {         System.out.println("Usage: java PlayClip <clip file>");         System.exit(0);       }       new PlayClip(args[0]);       System.exit(0);    // required in J2SE 1.4.2. or earlier     }

The call to exit( ) must be present in J2SE 1.4.2 or earlier (it's unnecessary if you're using J2SE 5.0). The problem is that the sound engine doesn't terminate all of its threads when it finishes, which prevents the JVM from terminating without an exit( ) call.

The PlayClip class implements the LineListener interface to detect when the clip has finished. The LineListener update( ) method is described below.

     public class PlayClip implements LineListener     { ... }   // PlayClip must implement update( )

The PlayClip( ) constructor loads and plays the clip.

     public PlayClip(String fnm)     {       df = new DecimalFormat("0.#");  // 1 dp       loadClip(SOUND_DIR + fnm);       play( );       // wait for the sound to finish playing; guess at 10 mins!       System.out.println("Waiting");       try {         Thread.sleep(600000);   // 10 mins in ms       }       catch(InterruptedException e)       { System.out.println("Sleep Interrupted"); }     }

The PlayClip constructor has a problem: it shouldn't return until the sound has finished playing. However, play( ) starts the sound playing and returns immediately, so the code must wait in some way. I make it sleep for 10 minutes. This doesn't mean PlayClip hangs around for 10 minutes after it has finished playing a one-second clip. The LineListener update( ) method will allow PlayClip to exit as soon as the clip has ended.

loadClip( ) is the heart of PlayClip and illustrates the low-level nature of Java Sound. The length of its code is due to AudioSystem's lack of direct support for ULAW and ALAW formatted data. ULAW and ALAW are compression-based codings that affect the meaning of the bits in a sample. By default, only linear encodings (such as PCM) are understood.

The playing of a ULAW or ALAW file is dealt with by converting its data into PCM format as it's read into the Clip object. If I ignore this conversion code and other error-handling, then loadClip( ) carries out six tasks:

     // 1. Access the audio file as a stream     AudioInputStream stream = AudioSystem.getAudioInputStream(                                     getClass( ).getResource(fnm) );         // 2. Get the audio format for the data in the stream     AudioFormat format = stream.getFormat( );     // 3. Gather information for line creation     DataLine.Info info = new DataLine.Info(Clip.class, format);     // 4. Create an empty clip using that line information     Clip clip = (Clip) AudioSystem.getLine(info);     // 5. Start monitoring the clip's line events     clip.addLineListener(this);     // 6. Open the audio stream as a clip; now it's ready to play     clip.open(stream);     stream.close( ); // I've done with the input stream

The monitoring of the clip's line events, which include when it is opened, started, stopped, and closed, is usually necessary to react to the end of a clip.

In task 1, AudioInputStream can take its input from a file, input stream, or URL, so it is a versatile way of obtaining audio input. The complete method is shown here:

     private void loadClip(String fnm)     {       try {         AudioInputStream stream = AudioSystem.getAudioInputStream(                             getClass( ).getResource(fnm) );         AudioFormat format = stream.getFormat( );         // convert ULAW/ALAW formats to PCM format         if ( (format.getEncoding( ) == AudioFormat.Encoding.ULAW) ||              (format.getEncoding( ) == AudioFormat.Encoding.ALAW) ) {           AudioFormat newFormat =              new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,                            format.getSampleRate( ),                            format.getSampleSizeInBits( )*2,                            format.getChannels( ),                            format.getFrameSize( )*2,                            format.getFrameRate( ), true);  // big endian           // update stream and format details           stream = AudioSystem.getAudioInputStream(newFormat, stream);           System.out.println("Converted Audio format: " + newFormat);           format = newFormat;         }         DataLine.Info info = new DataLine.Info(Clip.class, format);         // make sure the sound system supports this data line         if (!AudioSystem.isLineSupported(info)) {           System.out.println("Unsupported Clip File: " + fnm);           System.exit(0);         }         clip = (Clip) AudioSystem.getLine(info);         clip.addLineListener(this);         clip.open(stream);         stream.close( ); // I've done with the input stream         // duration (in secs) of the clip         double duration = clip.getMicrosecondLength( )/1000000.0;         System.out.println("Duration: " + df.format(duration)+" secs");       } // end of try block       catch (UnsupportedAudioFileException audioException) {         System.out.println("Unsupported audio file: " + fnm);         System.exit(0);       }       catch (LineUnavailableException noLineException) {         System.out.println("No audio line available for : " + fnm);         System.exit(0);       }       catch (IOException ioException) {         System.out.println("Could not read: " + fnm);         System.exit(0);       }       catch (Exception e) {         System.out.println("Problem with " + fnm);         System.exit(0);       }     } // end of loadClip( )

PCM creation uses the AudioFormat constructor:

     public AudioFormat(AudioFormat.Encoding encoding,                 float sampleRate, int sampleSizeInBits,                 int channels, int frameSize,                 float frameRate, boolean bigEndian);

loadClip( ) uses the constructor:

     AudioFormat newFormat =        new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,                 format.getSampleRate( ), format.getSampleSizeInBits( )*2,                 format.getChannels( ), format.getFrameSize( )*2,                 format.getFrameRate( ), true);  // big endian

ALAW and ULAW use an 8-bit byte to represent each sample, but after this has been decompressed the data requires 14 bits. Consequently, the PCM encoding must use 16 bits (2 bytes) per sample. This explains why the sampleSizeInBits and frameSize arguments are double the values obtained from the file's original audio format details.

Once the sample size goes beyond a single byte, the ordering of the multiple bytes must be considered. Big endian specifies a high-to-low byte ordering, while little endian is low-to-high. This is relevant if later I want to extract the sample's amplitude as a short or integer since the multiple bytes must be combined together correctly. The channels arguments refer to the use of mono (one channel) or stereo (two channels).

The audio encoding is PCM_SIGNED, which allows a range of amplitudes that include negatives. For 16-bit data, the range will be -215 to 215 - 1 (-32768 to 32767). The alternative is PCM_UNSIGNED, which only offers positive values, 0 to 216 (65536).

PlayClip's play( ) method is trivial:

     private void play( )     { if (clip != null)         clip.start( );   // start playing     }

This starts the clip playing without waiting. PlayClip sleeps, for as much as 10 minutes, while the clip plays. However, most clips will finish after a few seconds. Due to the LineListener interface, this will trigger a call to update( ):

     public void update(LineEvent lineEvent)     // called when the clip's line detects open,close,start,stop events     {       // has the clip reached its end?       if (lineEvent.getType( ) == LineEvent.Type.STOP) {         System.out.println("Exiting...");         clip.stop( );         lineEvent.getLine( ).close( );         System.exit(0);       }     }

The calls to stop( ) and close( ) aren't unnecessary but they ensure that the audio system resources are in the correct state before termination.

Short Sound Bug in J2SE 5.0

PlayClip.java works perfectly in J2SE 1.4.2 but fails when given short sound files in J2SE 5.0. For example, dog.wav is 0.5 seconds long, and PlayClip is silent for 0.5 seconds when asked to play it:

     java PlayClip dog.wav

However, if the requested sound clip is longer than 1 second, PlayClip will work as expected.

I have registered this bug with Sun at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5085008. I encourage you to vote for its fixing.

There's a similar bug reported at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5070730. Vote for that one, too.

A rather hacky solution is to force the sound to loop several times until its total playing time exceeds one second. An outline of that solution can be found on the previous bug report web page and is implemented in PlayClipBF.java in SoundExamps/SoundPlayer/ (BF for "bug fix"), which is almost identical to PlayClip.java, except in two places.

A loop counter is calculated, based on the clip's duration:

     double duration = clip.getMicrosecondLength( )/1000000.0;     loopCount = (int) (1.0 / duration);

This code is added to loadClip( ), and loopCount is defined as a global integer. In play( ), the clip is not started with a call to start( ) but made to loop loopCount times:

     // clip.start( );   // start looping not playing (in play( ))     clip.loop(loopCount);

In my future code, I'll assume that any sound files are longer than one second in length, so it won't fix things through looping. However, I will add a duration test and a warning message. For example, loadClip( ) in PlayClip is modified to call checkDuration( ):

     void checkDuration( )     {       double duration = clip.getMicrosecondLength( )/1000000.0;       if (duration <= 1.0) {      System.out.println("WARNING. Duration <= 1 sec : " + df.format(duration) + " secs");      System.out.println("         The clip may not play in J2SE 1.5 -- make it longer");       }       else          System.out.println("Duration: " + df.format(duration) + " secs");     }

Playing a Buffered Sample

As Figure 7-5 suggests, a program can pass audio data to the mixer by sending discrete packets (stored in byte arrays) along the SourceDataLine. The main reason for using this approach is to handle large audio files that cannot be loaded into a Clip.

BufferedPlayer.java does the same task as PlayClip.java, which is to play an audio file supplied on the command line. The differences are only apparent inside the code. One cosmetic change is that the program is written as a series of static methods called from main( ). This is just a matter of taste; the code could be "objectified" to look similar to PlayClip.java; it's shown using the static approach here:

     // globals     private static AudioInputStream stream;     private static AudioFormat format = null;     private static SourceDataLine line = null;     public static void main(String[] args)     { if (args.length != 1) {         System.out.println("Usage: java BufferedPlayer <clip file>");         System.exit(0);       }           createInput("Sounds/" + args[0]);       createOutput( );       int numBytes = (int)(stream.getFrameLength( ) *                                      format.getFrameSize( ));            // use getFrameLength( ) from the stream, since the format            // version may return -1 (WAV file formats always return -1)       System.out.println("Size in bytes: " + numBytes);       checkDuration( );       play( );       System.exit(0);   // necessary in J2SE 1.4.2 and earlier     }

BufferedPlayer.java can be found in the SoundExamps/SoundPlayer/ directory.

createInput( ) is similar to PlayClip's loadClip( ) method but a little simpler. If I ignore the PCM conversion code for ULAW and ALAW formatted data, and other error handling, it does two tasks:

     // access the audio file as a stream     stream = AudioSystem.getAudioInputStream( new File(fnm) );     // get the audio format for the data in the stream     format = stream.getFormat( );

createOutput( ) creates the SourceDataLine going to the mixer:

     private static void createOutput( )     {       try {         // gather information for line creation         DataLine.Info info =               new DataLine.Info(SourceDataLine.class, format);         if (!AudioSystem.isLineSupported(info)) {           System.out.println("Line does not support: " + format);           System.exit(0);         }         // get a line of the required format         line = (SourceDataLine) AudioSystem.getLine(info);         line.open(format);       }       catch (Exception e)       {  System.out.println( e.getMessage( ));          System.exit(0);       }     }  // end of createOutput( )

createOutput( ) collects line information and then creates a SourceDataLine based on that information.

checkDuration( ) calculates a duration using the audio stream's attributes and prints a warning if the sound file is one second long or less. This warning is the same as the one issued by checkDuration( ) in PlayClip. However, PlayClip's code obtains the duration using:

     double duration = clip.getMicrosecondLength( )/1000000.0;

getMicrosecondLength( ) isn't available to an AudioInputStream object, so the time in BufferedPlayer is calculated with:

     double duration = ((stream.getFrameLength( )*1000)/                               stream.getFormat( ).getFrameRate( ))/1000.0;

play( ) repeatedly reads a chunk of bytes from the AudioInputStream and writes them to the SourceDataLine until the stream is empty. As a result, BufferedPlayer only requires memory large enough for the byte array buffer, not the entire audio file:

     private static void play( )     {       int numRead = 0;       byte[] buffer = new byte[line.getBufferSize( )];       line.start( );       // read and play chunks of the audio       try {         int offset;         while ((numRead = stream.read(buffer,0,buffer.length)) >= 0) {           offset = 0;           while (offset < numRead)             offset += line.write(buffer, offset, numRead-offset);         }       }       catch (IOException e)       {  System.out.println( e.getMessage( )); }       // wait until all data is played, then close the line       line.drain( );       line.stop( );       line.close( );     }

The size of the buffer is determined by asking the SourceDataLine via getBufferSize( ). Alternatively, I could calculate a size myself.

After the loop finishes, drain( ) causes the program to wait until all the data in the line has been passed to the mixer. Then it's safe for the line to be stopped and closed and for the program to terminate.