12.2 Audio Formats


12.2.1 AU File Format

Audio (AU) files have a header and a content section. The header contains fields that describe the encoding format, as well as comments, titles, or any other text annotations, as listed in Table 12.7. The content section contains the audio data encoded according to the encoding specified and interpreted according to Table 12.7.

12.2.2 Musical Instrument Digital Interface (MIDI)

The MIDI is a standard interface enabling the interpretability of synthesizers, sequencers, home computers, and rhythm machines [MIDI]. Each MIDI-equipped instrument usually contains a receiver and a transmitter. Some instruments may contain only a receiver or transmitter. The receiver receives messages in MIDI format and executes MIDI commands. It consists of an optoisolator, Universal Asynchronous Receiver/Transmitter (UART), and other hardware needed to perform the intended functions. The transmitter originates messages in MIDI format and transmits them by way of a UART and line driver.

A MIDI (.mid) file contains basically two things: a single header chunk and multiple track chunks . A track may be thought of in the same way as a track on a multi-track tape deck. A single track may be assigned to each voice, each staff, or each instrument.

Table 12.7. The Header of an Audio (.AU) File

Field Name

Description

magic_number

The 4-byte value 0x2e736e64 (ASCII ".snd")

data_offset

A 4-byte field specifying the offset, in octets, to the data part. The minimum valid number is 24 (decimal).

data_size

A 4-byte field specifying the size in octets, of the data part. If unknown, the value 0xffffffff should be used.

encoding

A 4-byte field with the following value interpretations:

01. 8-bit ISDN u-law

02. 8-bit linear PCM [REF-PCM]

03. 16-bit linear PCM

04. 24-bit linear PCM

05. 32-bit linear PCM

06. 32-bit IEEE floating point

07. 64-bit IEEE floating point

08. Fragmented sample data

09. Nested encoding

10. DSP encoding.

11. 8-bit fixed point samples

12. 16-bit fixed point samples

13. 24-bit fixed point samples

14. 32-bit fixed point samples

15. Unknown

16. Non-audio display data

17. u-law Squelch

18. 16-bit linear with emphasis

19. 16-bit linear with compression

20. A combination of 18 and 19.

21. DSP encoding.

22. DSP encoding.

23. CCITT g.721 4-bit ADPCM

24. CCITT g.722 ADPCM

25. CCITT g.723 3-bit ADPCM

26. CCITT g.723 5-bit ADPCM

27. 8-bit ISDN A-law

sample_rate

A 4-byte field specifying the number of samples per second (see Table 12.8)

channels

A 4-byte field specifying the number of interleaved channels

comment

24 bytes of text

sample_data

The encoded audio content of this file.

Table 12.8. Common Fixed Point Sample Rate Codes

Code

Rate

0xAC440000

44100.00000 Hz

0x56EE8BA3

22254.54545 Hz

0x56220000

22050.00000 Hz

0x2B7745D1

11127.27273 Hz

0x2B110000

11025.00000 Hz

0x1CFA2E8B

7418.18180 Hz

0x15BBA2E

5563.63630 Hz

12.2.2.1 Scalable Polyphony

As explained earlier, the MIDI transmitter sends messages to a MIDI receiver. Scalable Polyphony MIDI is the result of a new MIDI message that composers can use to indicate how a MIDI data should be performed by devices with different polyphony. For example, a composition that is written for GM2's 32-note polyphony could also be made to play on GM1 and GM Lite devices, by eliminating certain instrument parts , as chosen by the composer.

Scalable polyphony MIDI was conceived as a solution for 3rd Generation (3G) mobile applications and systems, as an alternative to general MIDI Lite (which requires a fixed 16-note polyphony); today it is common to hear phone play tunes. For mobile applications, SP-MIDI provides flexibility to both the consumer and manufacturer. For example, lower-cost phones may be offered that have only 8-note polyphony, vs. higher priced models that have 32-note polyphony, yet the same content plays on either phones.

12.2.3 MP3

This section does not present decoding specifications; readers are referred to the MPEG standard for the full specification [MP3]. This section is an informative text describing the MPEG version 1, 2 and 2.5 and Layer I, II and III are described as well as the MP3 TAG (MP3v1 and MP3v1.1).

An MPEG-1 layer 3 audio file, MP3, is a compressed audio format. It is assembled from a sequence of independent frames . Each frame has its own header and audio information. Two critical parameters are bit rate and sampling frequency. The bit-rate specifies how many kilobits per second of music (are stored and) need to be processed . This parameter is independent of the sampling rate, which specifies how many times per second the PCM stream represented by this file samples the sound waves.

There is no file header, and therefore, one can cut portions of the MPEG file and still play it correctly. To extract metadata and other information about the file, it is usually sufficient to find the header of the first frame, read its header, and ignore the other frame (i.e., assume they are associated with the same metadata).

12.2.3.1 Frame Header

Each frame has a 32-bit header; Table 12.9 specifies the meaning of each bit. A value that is reserved, invalid, bad, or not allowed indicates an invalid header. The first 11 bits of a frame header are always set and they are called frame sync . It is therefore possible to find the location of frame by simply searching through the file for all occurrences of occurrence of eleven bits set. This means locating all bytes with values of 0xFF = 255 decimal, followed by another byte whose 3 most significant bits set, namely 0xE0. A 16 bit CRC may follow the header; after the CRC comes the audio data.

Table 12.9. MP3 Frame Header Format Specifications

Field Name

Bits

Description

frame_sync

31 “21

Synchronization bits marking the beginning of the header.

MPEG_audio_version

20 “19

00 ”MPEG Version 2.5

01 ”Reserved

10 ”MPEG Version 2

11 ”MPEG Version 1

layer_description

18 “17

00 ”Reserved

01 ”Layer III

10 ”Layer II

11 ”Layer I

protection_bit

16

0 ”Protected by CRC (16bit CRC follows header)

1 ”Not protected

bit_rate_index

15 “12

Interpreted according to Table 12.11; all values are in kbps.

V1 ”MPEG Version 1

V2 ”MPEG Version 2 and Version 2.5

L1 ”Layer I

L2 ”Layer II

L3 ”Layer III

"free" means variable bitrate .

"bad" means that this is not an allowed value

sampling_rate

11 “10

The sampling frequency according to Table 12.10.

padding_bit

9

0 ”Frame is not padded

1 ”Frame is padded with one extra bit

reserved_bit

8

Reserved and its value should be ignored.

intensity_stereo

7

Set if intensity stereo is on.

MS_stereo

6

Set if MS stereo is on.

sterero_mode_extension

5 “4

The stereo mode used for recording this data.

copyright_bit

3

0 ”Audio is not copyrighted

1 ”Audio is copyrighted

original_media_bit

2

0 ”Copy of original media

1 ”Original media

emphasis

1 “0

00 ”None

01 ”50/15 ms

10 ”Reserved

11 ”CCIT J.17

Table 12.10. Bitrate Index Table (values are in kbps).

bits

V1,L1

V1,L2

V1,L3

V2,L1

V2,L2

V2,L3

0000

free

Free

free

free

free

free

0001

32

32

32

32

32

8

0010

64

48

40

64

48

16

0011

96

56

48

96

56

24

0100

128

64

56

128

64

32

0101

160

80

64

160

80

64

0110

192

96

80

192

96

80

0111

224

112

96

224

112

56

1000

256

128

112

256

128

64

1001

288

160

128

288

160

128

1010

320

192

160

320

192

160

1011

352

224

192

352

224

112

1100

384

256

224

384

256

128

1101

416

320

256

416

320

256

1110

448

384

320

448

384

320

1111

bad

bad

bad

bad

bad

bad

Table 12.11. Sampling Frequency Index Table

bits

MPEG1

MPEG2

MPEG2.5

00

44100

22050

11025

01

48000

24000

12000

10

32000

16000

8000

11

Reserved

To calculate the size of a frame, one needs to examine the bit_rate , sample_rate , and padding_bit of the frame header. The length is computed as follows:

FrameSize = 144 x BitRate / SampleRate + Padding

For example, for a bit_rate of 128000, sample_rate of 441000 and padding_bit of 0, the frame size is 417 bytes.

12.2.3.2 Audio Tag

The MP3v1 Audio tag is used to provides non-audio information about the MPEG Audio file. It contains information about artist, title, album, publishing year, comments and genre. The tag is exactly 128 bytes long and is located at very end of the audio data. It can be extracted by reading the last 128 bytes of the MPEG audio file. Table 12.12 presents the format of the audio tag; all fields are padded with null character (ASCII 0), but some pad them with spaces (ASCII 32). Genre codes are specified in Table 12.13.

Table 12.12. MP3v1 Audio Tag ”Last 128 Bytes of File

Field Name

Bytes

Description

Genre

127

The last byte in the file is the genre code, interpreted as in Table 12.15.

Comment

97 “126

Generic text comment about the content.

Year

93 “96

The year this album was released.

Artist

33 “62

The name of the artist(s).

Title

3 “32

The title of the album.

sampling_rate

0 “2

Tag identification. Must contain 'TAG' if tag exists and is correct

Table 12.13. MP3 Genre Codes

#

Description

#

Description

#

Description

#

Description

Blues

20

Alternative

40

AlternRock

60

Top 40

1

Classic Rock

21

Ska

41

Bass

61

Christian Rap

2

Country

22

Death Metal

42

Soul

62

Pop/Funk

3

Dance

23

Pranks

43

Punk

63

Jungle

4

Disco

24

Soundtrack

44

Space

64

Native American

5

Funk

25

Euro-Techno

45

Meditative

65

Cabaret

6

Grunge

26

Ambient

46

Instrumental Pop

66

New Wave

7

Hip-Hop

27

Trip-Hop

47

Instrumental Rock

67

Psychedelic

8

Jazz

28

Vocal

48

Ethnic

68

Rave

9

Metal

29

Jazz+Funk

49

Gothic

69

Showtunes

10

NewAge

30

Fusion

50

Darkwave

70

Trailer

11

Oldies

31

Trance

51

Techno-Industrial

71

Lo-Fi

12

Other

32

Classical

52

Electronic

72

Tribal

13

Pop

33

Instrumental

53

Pop-Folk

73

Acid Punk

14

R&B

34

Acid

54

Eurodance

74

Acid Jazz

15

Rap

35

House

55

Dream

75

Polka

16

Reggae

36

Game

56

Southern Rock

76

Retro

17

Rock

37

Sound Clip

57

Comedy

77

Musical

18

Techno

38

Gospel

58

Cult

78

Rock & Roll

19

Industrial

39

Noise

59

Gangsta

79

Hard Rock

12.2.4 Resource Interexchange File Format (RIFF)

A RIFF file is composed of an 8-byte header (Table 12.14), a set of chunks (Table 12.15) or lists (Table 12.16). RIFF is a clone of the IFF format invented by Electronic Arts (EA) in 1984. IFF was created for Deluxe Paint on the Commodore Amiga, became the standard on that platform and was maintained by Commodore until its demise. EA also ported Deluxe Paint to the PC platform and brought IFF with it. One key characteristics common to both IFF and RIFF are the 4-Character Codes (FourCC). RIFF is so close to IFF that many IFF parsers can parse RIFF files.

Table 12.14. RIFF Header

Field Name

Size

Description

ID

4 bytes

A human readable 4-char sequence specifying the file subtype; for pure RIFF type, this is set to "RIFF".

length

4 bytes

Length following this header, namely, file_length “ 8 .

Table 12.15. RIFF Data Chunk

Field Name

Size

Description

chunk_type

4 bytes

A 4-char sequence specifying the chunk type; at most one fmt chunk; all others are data chunks.

length

4 bytes

Length of this chunk.

data_byte

'length' bytes

'The data within this chunk.

Table 12.16. RIFF List Chunk

Field Name

Size

Description

element_count

4 bytes

The number of elements in this list.

list_ID

4 bytes

A sequence of 4 characters such as 'rec ' or 'movi'.

data_byte

'length' bytes

'The data within this list.

12.2.5 Compact Disk Audio (CDA) Tracks

Compact Disk Audio (CDA) track files are generally RIFF resources. The RIFF id of .CDA file is "CDDA" (43h, 44h, 44h, 41h). They contain only one data block called "fmt " (66h, 6dh, 74h, 20h). In current version of .CDA file, this block is 24 bytes long (see Table 12.17).

Table 12.17. CDA Track Format

Field Name

Length

Description

format_version

2 bytes

CDA file version; if not =1 this table may be out of date.

track_count

2 bytes

Number of track.

serial_number

4 bytes

CD disc serial number.

start_time_HSG

4 bytes

Beginning of the track in HSG format.

duration_HSG

4 bytes

Length of the track in HSG format.

start_time_RB

4 bytes

Beginning of the track in Red-Book format.

duration_RB

4 bytes

Length of the track in Red-Book format.

Time is represented in two formats: HSG and Red-Book. HSG can be calculated as follows:

time = minute x 4500 + second x 75 + frame

Red-book is much easier to use that HSG, because it contains minutes, seconds and frames in unmodified form.

12.2.6 WAVE

WAVE (WAV) files are generally RIFF resources with a RIFF id of "WAVE". The format chunk always occurs before the data chunk, and both of these chunks are mandatory (Table 12.18). As with all RIFF forms, decoders should expect and ignore chunks of unknown format.

Table 12.18. WAV File Components

Chunk Name

Description

RIFF-ID

"WAVE"

fmt-ck

Format chunk

fact-ck

Fact chunk

cue-ck

cue-points

playlist-ck

Playlist

assoc-data-list

Associated data list

wave-data

Wave-specific data chunk

12.2.6.1 WAVE Format Chunk

The WAVE format chunk fmt-ck specifies the format of the data. It is composed of common and format-specific fields. The list of parameters specified by the latter depends on the WAVE format category. Playback software should allow for (and ignore) any unknown format-specific fields parameters that occur at the end of this field.

Table 12.19. Common Fields of the WAV Format Chunk

Field Name

Description

wFormatTag

A number indicating the WAVE format category of the file. The content of the format-specific fields portion of the fmt chunk.

wChannels

The number of channels represented in the waveform data, such as 1 for mono or 2 for stereo.

dwSamplesPerSec

The sampling rate (in samples per second) at which each channel is played .

dwAvgBytesPerSec

The average number of bytes per second at which the waveform data is transferred. Playback software can estimate the buffer size using this value.

wBlockAlign

The block alignment (in bytes) of the waveform data. Playback software needs to process a multiple of wBlockAlign bytes of data at a time, so the value of wBlockAlign can be used for buffer alignment.

12.2.6.2 WAVE Format Categories

The format category of a WAVE file is specified by the value of the wFormatTag field of the format chunk. The representation of data, and the content of the format specific fields of the format chunk depends on the format category (see Table 12.20).

Table 12.20. WAV File Format Categories

Format Name

Description

WAVE_FORMAT_PCM (0x0001)

Microsoft Pulse Code Modulation (PCM)

FORMAT_MULAW (0x0101)

IBM mu-law format

IBM_FORMAT_ALAW (0x0102)

IBM a-law format

IBM_FORMAT_ADPCM (0x0103)

IBM AVC Adaptive Differential PCM format

12.2.6.3 Microsoft WAVE_FORMAT_PCM

When the wFormatTag field of the fmt-ck RIFF format chunk is set to WAVE_FORMAT _PCM, then the waveform data consists of samples represented in PCM format (see Table 12.21). For PCM waveform data, the format specific fields includes wBitsPerSample only.

In a single channel WAVE file, samples are stored consecutively. For stereo WAVE files, channel 0 represents the left channel, and channel 1 represents the right channel. The speaker position mapping for more than two channels is currently undefined. In multiple-channel WAVE files, samples are interleaved.

Table 12.21. WAVE_FORMAT_PCT Structure

Field

Description

wAvgBytesPerSec

The number of bits of data used to represent each sample of each channel. If there are multiple channels, the sample size is the same for each channel.

wAvgBytesPerSec

For PCM data, the wAvgBytesPerSec field of the format chunk should be equal to the following formula rounded up to the next whole number:

wChannels x wBitsPerSecond x ( wBitsPerSample/8 ).

wBlockAlign

The wBlockAlign field should be equal to the following formula, rounded to the next whole number: wChannels x (wBitsPerSample/8).

Each sample is contained in an integer i. The size of i is the smallest number of bytes required to contain the specified sample size. The least significant byte is stored first. The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero. For example, if the sample size (recorded in nBitsPerSample ) is 12 bits, then each sample is stored in a two-byte integer. The least significant four bits of the first (least significant) byte is set to zero.

The waveform data contains, among other chunks, a slnt chunk representing silence. In 16-bit PCM data, if the last sample value played before the silence section is a 10000, then if data is still output to the Digital to Analog Converter (DAC), it maintains the 10000 value. If a zero value is used, a click may be heard at the start and end of the silence section. If play begins at a silence section, then a zero value might be used because no other information is available. A click might be created if the data following the silent section starts with a nonzero value.

FACT Chunk

The fact-ck fact chunk stores the file size and is required for all compressed audio formats. The chunk is not required for PCM files using the data chunk format.

Cue-Points Chunk

The cue-ck cue-points chunk identifies a series of positions in the waveform data stream (see Table 12.22).

Table 12.22. Fields of the Cue-Points Chunk

Field

Description

dwName

The cue-point name. Each cue-point record has a unique dwName field.

dwPosition

The sequential sample number of this cue-point in the play order.

ccChunk

The name of the chunk containing the cue-point.

dwChunkStart

A byte offset relative to the start of the data section of the wavl LIST chunk.

dwBlockStart

A byte offset relative to the start of the data section of the wavl LIST chunk.

dwSampleOffset

The sample offset of the cue-point relative to the start of the block.

dwChunkStart

File position of the 'data' chunk relative to the start of the data section in the 'wavl' LIST chunk.

dwBlockStart

File position of the cue-point relative to the start of the data section of the 'wavl' LIST chunk.

Playlist Chunk

The playlist-ck playlist chunk specifies a play order for a series of cue-points. The playlist-ck fields are specified in Table 12.23.

Table 12.23. Fields of the Playlist Chunk

Field

Description

dwName

The cue-point name matching a name listed in the cue-ck cue-point table.

dwLength

Specifies the length of the section in samples.

dwLoops

Specifies the number of times to play the section.

Associated Data Chunk

The assoc-data-list associated data list provides the ability to attach information like labels to sections of the waveform data stream.

Label and Note Information

The labl and note chunks have similar fields. The labl chunk contains a label, or title, to associate with a cue-point. The note chunk contains comment text for a cue-point (see Table 12.24).

Table 12.24. Label and Note Information Fields

Field

Description

dwName

The cue-point name matching a name listed in the cue-ck cue-point table.

data

A NULL- terminated string containing a text label (for the labl chunk) or comment text (for the note chunk).

Text with Data Length Information

The ltxt chunk contains text that is associated with a data segment of specific length (see Table 12.25).

Table 12.25. Text with Data Length Information Fields

Field

Description

DwName

A cue-point name matching a name listed in the cue-ck cue-point table.

dwSampleLength

The number of samples in the segment of waveform data.

dwPurpose

The FourCC codetype or purpose (e.g. capt for close-caption).

wCountry

The country code for the text.

wLanguage , wDialect

The language and dialect codes for the text.

wCodePage

The code page for the text.

Embedded File Information

The file chunk contains information described in other file formats such as an RDIB file or an ASCII text file (see Table 12.26).

Table 12.26. Embedded File Information Fields

Field

Description

dwName

The cue-point name matching a name listed in the cue-ck cue-point table.

dwMedType

This is the file type contained in the fileData field. If the fileData section contains a RIFF form, the dwMedType field is the same as the RIFF form type.

fileData

Contains the media file data.



ITV Handbook. Technologies and Standards
ITV Handbook: Technologies and Standards
ISBN: 0131003127
EAN: 2147483647
Year: 2003
Pages: 170

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net