12.2.1 AU File FormatAudio (AU) files have a header and a content section. The header contains fields that describe the encoding format, as well as comments, titles, or any other text annotations, as listed in Table 12.7. The content section contains the audio data encoded according to the encoding specified and interpreted according to Table 12.7. 12.2.2 Musical Instrument Digital Interface (MIDI)The MIDI is a standard interface enabling the interpretability of synthesizers, sequencers, home computers, and rhythm machines [MIDI]. Each MIDI-equipped instrument usually contains a receiver and a transmitter. Some instruments may contain only a receiver or transmitter. The receiver receives messages in MIDI format and executes MIDI commands. It consists of an optoisolator, Universal Asynchronous Receiver/Transmitter (UART), and other hardware needed to perform the intended functions. The transmitter originates messages in MIDI format and transmits them by way of a UART and line driver. A MIDI (.mid) file contains basically two things: a single header chunk and multiple track chunks . A track may be thought of in the same way as a track on a multi-track tape deck. A single track may be assigned to each voice, each staff, or each instrument. Table 12.7. The Header of an Audio (.AU) File
Table 12.8. Common Fixed Point Sample Rate Codes
12.2.2.1 Scalable PolyphonyAs explained earlier, the MIDI transmitter sends messages to a MIDI receiver. Scalable Polyphony MIDI is the result of a new MIDI message that composers can use to indicate how a MIDI data should be performed by devices with different polyphony. For example, a composition that is written for GM2's 32-note polyphony could also be made to play on GM1 and GM Lite devices, by eliminating certain instrument parts , as chosen by the composer. Scalable polyphony MIDI was conceived as a solution for 3rd Generation (3G) mobile applications and systems, as an alternative to general MIDI Lite (which requires a fixed 16-note polyphony); today it is common to hear phone play tunes. For mobile applications, SP-MIDI provides flexibility to both the consumer and manufacturer. For example, lower-cost phones may be offered that have only 8-note polyphony, vs. higher priced models that have 32-note polyphony, yet the same content plays on either phones. 12.2.3 MP3This section does not present decoding specifications; readers are referred to the MPEG standard for the full specification [MP3]. This section is an informative text describing the MPEG version 1, 2 and 2.5 and Layer I, II and III are described as well as the MP3 TAG (MP3v1 and MP3v1.1). An MPEG-1 layer 3 audio file, MP3, is a compressed audio format. It is assembled from a sequence of independent frames . Each frame has its own header and audio information. Two critical parameters are bit rate and sampling frequency. The bit-rate specifies how many kilobits per second of music (are stored and) need to be processed . This parameter is independent of the sampling rate, which specifies how many times per second the PCM stream represented by this file samples the sound waves. There is no file header, and therefore, one can cut portions of the MPEG file and still play it correctly. To extract metadata and other information about the file, it is usually sufficient to find the header of the first frame, read its header, and ignore the other frame (i.e., assume they are associated with the same metadata). 12.2.3.1 Frame HeaderEach frame has a 32-bit header; Table 12.9 specifies the meaning of each bit. A value that is reserved, invalid, bad, or not allowed indicates an invalid header. The first 11 bits of a frame header are always set and they are called frame sync . It is therefore possible to find the location of frame by simply searching through the file for all occurrences of occurrence of eleven bits set. This means locating all bytes with values of 0xFF = 255 decimal, followed by another byte whose 3 most significant bits set, namely 0xE0. A 16 bit CRC may follow the header; after the CRC comes the audio data. Table 12.9. MP3 Frame Header Format Specifications
Table 12.10. Bitrate Index Table (values are in kbps).
Table 12.11. Sampling Frequency Index Table
To calculate the size of a frame, one needs to examine the bit_rate , sample_rate , and padding_bit of the frame header. The length is computed as follows:
For example, for a bit_rate of 128000, sample_rate of 441000 and padding_bit of 0, the frame size is 417 bytes. 12.2.3.2 Audio TagThe MP3v1 Audio tag is used to provides non-audio information about the MPEG Audio file. It contains information about artist, title, album, publishing year, comments and genre. The tag is exactly 128 bytes long and is located at very end of the audio data. It can be extracted by reading the last 128 bytes of the MPEG audio file. Table 12.12 presents the format of the audio tag; all fields are padded with null character (ASCII 0), but some pad them with spaces (ASCII 32). Genre codes are specified in Table 12.13. Table 12.12. MP3v1 Audio Tag ”Last 128 Bytes of File
Table 12.13. MP3 Genre Codes
12.2.4 Resource Interexchange File Format (RIFF)A RIFF file is composed of an 8-byte header (Table 12.14), a set of chunks (Table 12.15) or lists (Table 12.16). RIFF is a clone of the IFF format invented by Electronic Arts (EA) in 1984. IFF was created for Deluxe Paint on the Commodore Amiga, became the standard on that platform and was maintained by Commodore until its demise. EA also ported Deluxe Paint to the PC platform and brought IFF with it. One key characteristics common to both IFF and RIFF are the 4-Character Codes (FourCC). RIFF is so close to IFF that many IFF parsers can parse RIFF files. Table 12.14. RIFF Header
Table 12.15. RIFF Data Chunk
Table 12.16. RIFF List Chunk
12.2.5 Compact Disk Audio (CDA) TracksCompact Disk Audio (CDA) track files are generally RIFF resources. The RIFF id of .CDA file is "CDDA" (43h, 44h, 44h, 41h). They contain only one data block called "fmt " (66h, 6dh, 74h, 20h). In current version of .CDA file, this block is 24 bytes long (see Table 12.17). Table 12.17. CDA Track Format
Time is represented in two formats: HSG and Red-Book. HSG can be calculated as follows:
Red-book is much easier to use that HSG, because it contains minutes, seconds and frames in unmodified form. 12.2.6 WAVEWAVE (WAV) files are generally RIFF resources with a RIFF id of "WAVE". The format chunk always occurs before the data chunk, and both of these chunks are mandatory (Table 12.18). As with all RIFF forms, decoders should expect and ignore chunks of unknown format. Table 12.18. WAV File Components
12.2.6.1 WAVE Format ChunkThe WAVE format chunk fmt-ck specifies the format of the data. It is composed of common and format-specific fields. The list of parameters specified by the latter depends on the WAVE format category. Playback software should allow for (and ignore) any unknown format-specific fields parameters that occur at the end of this field. Table 12.19. Common Fields of the WAV Format Chunk
12.2.6.2 WAVE Format CategoriesThe format category of a WAVE file is specified by the value of the wFormatTag field of the format chunk. The representation of data, and the content of the format specific fields of the format chunk depends on the format category (see Table 12.20). Table 12.20. WAV File Format Categories
12.2.6.3 Microsoft WAVE_FORMAT_PCMWhen the wFormatTag field of the fmt-ck RIFF format chunk is set to WAVE_FORMAT _PCM, then the waveform data consists of samples represented in PCM format (see Table 12.21). For PCM waveform data, the format specific fields includes wBitsPerSample only. In a single channel WAVE file, samples are stored consecutively. For stereo WAVE files, channel 0 represents the left channel, and channel 1 represents the right channel. The speaker position mapping for more than two channels is currently undefined. In multiple-channel WAVE files, samples are interleaved. Table 12.21. WAVE_FORMAT_PCT Structure
Each sample is contained in an integer i. The size of i is the smallest number of bytes required to contain the specified sample size. The least significant byte is stored first. The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero. For example, if the sample size (recorded in nBitsPerSample ) is 12 bits, then each sample is stored in a two-byte integer. The least significant four bits of the first (least significant) byte is set to zero. The waveform data contains, among other chunks, a slnt chunk representing silence. In 16-bit PCM data, if the last sample value played before the silence section is a 10000, then if data is still output to the Digital to Analog Converter (DAC), it maintains the 10000 value. If a zero value is used, a click may be heard at the start and end of the silence section. If play begins at a silence section, then a zero value might be used because no other information is available. A click might be created if the data following the silent section starts with a nonzero value. FACT ChunkThe fact-ck fact chunk stores the file size and is required for all compressed audio formats. The chunk is not required for PCM files using the data chunk format. Cue-Points ChunkThe cue-ck cue-points chunk identifies a series of positions in the waveform data stream (see Table 12.22). Table 12.22. Fields of the Cue-Points Chunk
Playlist ChunkThe playlist-ck playlist chunk specifies a play order for a series of cue-points. The playlist-ck fields are specified in Table 12.23. Table 12.23. Fields of the Playlist Chunk
Associated Data ChunkThe assoc-data-list associated data list provides the ability to attach information like labels to sections of the waveform data stream. Label and Note InformationThe labl and note chunks have similar fields. The labl chunk contains a label, or title, to associate with a cue-point. The note chunk contains comment text for a cue-point (see Table 12.24). Table 12.24. Label and Note Information Fields
Text with Data Length InformationThe ltxt chunk contains text that is associated with a data segment of specific length (see Table 12.25). Table 12.25. Text with Data Length Information Fields
Embedded File InformationThe file chunk contains information described in other file formats such as an RDIB file or an ASCII text file (see Table 12.26). Table 12.26. Embedded File Information Fields
|