Hack3.8.Construct Your MP3s

Hack 3.8. Construct Your MP3s

Use command-line tools to encode your MP3 files as well as to build customized MP3 files on the fly.

The MPEG-1 Audio Layer 3 format (MP3 for short) is a lossy compression format for audio. This means the sound going in will have elements removed to pack it into a smaller output size. The process of building an MP3 file from an input sound file is called encoding. The software that does this encoding is called an encoder.

MP3 first became popular around 1995. The first encoders did a poor job, so if you wanted an MP3 that was indistinguishable from a CD, you needed a bit rate of 256. (The bit rate relates to the amount of data stored per second in the file. The more bits, the better the sound.) Modern encoders, such as the latest version of the free LAME encoder (http://lame.sf.net/), produce CD-quality sound at a bit rate of 192.

Deciding what to remove from a recording to create the compressed file is fairly tricky. Modern encoders use a psychoacoustic model of our hearing to decide what to give the most bits to in the recording. Most of the time this will be the dominant frequencies in the segment being encoded. However, encoders are free to use whatever mechanism they think will produce the best sound for the given bit rate.

It follows that for less complex signals, such as a single human voice, we could spend fewer bits and still get a quality sound. This is why spoken-word podcasts encoded at 64 bits or even 32 bits still sound reasonably well. If you have a more complex sound, such as that of a music show, you should probably use 128-bit encoding.

3.8.1. Variable Bit Rate Encoding

Another option is to use variable bit rate encoding, or VBR. An MP3 file is organized into frames of data. Each frame has a header that indicates the number of channels, the sampling rate, and the bit rate. The bit rates can be altered from frame to frame. This is called variable bit rate, and it allows for a podcast to vary between complex musical segments and simpler spoken-word segments while achieving optimal compression overall.

The downside of variable bit rate encoding is that it's not guaranteed to be compatible on all players. I suggest you try VBR compression to see if it will make a big difference, and if it does, test it with a couple of modern clients to make sure it works.

3.8.2. ID3 Tags

Unlike records and CDs, the MP3 file format includes some information identifying the file's content. The first version of this identification standard was called ID3 v1.0 [Hack #40] and it contained enough room to put in a song title, an artist name, an album name, a year, a short comment, and a genre identifier.

Each field was given a set number of characters. Song name, artist, album, and comment were allowed a maximum of 30 characters. Year was given 4 characters, and the genre ID 1 byte. These are ASCII characters, so each is a single byte. Unicode support won't be available until Version 2.0. Version 1.1 of the standard cannibalized 2 bytes from the comment tag to provide space for a track number.

The specification hardcoded 126 different genresfrom common ones such as Blues, Funk, Rock, and Reggae to more obscure Booty Bass, Primus, and Porn Groove. Podcasts tend to fit conveniently into genre #12, or Other.

The second version of the ID3 standard bears little resemblance to the first version. Instead of a single, small structure with a few fields, this version features a much more flexible tagged format. The format can store Unicode text for all the fields, which allows for internationalization. It can also store media such as pictures, large blocks of text such as original lyrics, and handy things for DJs such as beats per minute (BPM).

You can find more information on all of these standards at the ID3 site (http://id3.org/).

3.8.3. Encoding with LAME

LAME, free software that includes encoding functionality, is available on the LAME site (http://lame.sf.net/). It's an outstanding encoder that you can plug into your programs and use as a DLL or a library. You can also use it directly from the command line.

To encode a file from the command line you need to give it the name of the input file and the name of the MP3 file to create:

 % lame mysound.wav mysound.mp3 LAME version 3.96.1 (http://lame.sourceforge.net/) Using polyphase lowpass filter, transition band: 17249 Hz - 17782 Hz Encoding hello.wav to hello.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=3  Frame     |CPU time estim |REAL time/estim | play/CPU | ETA  20/23(87%)|  0:00/   0:00 |   0:01/    0:01|   5.2245x| 0:00 average: 128.0 kbps         MS: 23 (100.0%) Writing LAME Tag…done ReplayGain: -14.6dB

Here the mysound.wav file is being encoded as MP3 in the new mysound.mp3 file. The encoder is using the 128 kbps bit rate (the default).

To change the bit rate of the encoding, add theb option:

 % lame -b 32 mysound.wav mysound.mp3

That command line encodes the same sound as before, but at the significantly poorer 32 kbps bit rate.

Bit rate is the main quality setting in MP3. The larger the bit rate setting, the better the quality. But LAME also provides another quality switch, -q, for setting the Q value:

 % lame q 0 mysound.wav mysound.mp3

Setting the Q value to 0 requests that the best-quality compression be done. The other side of the settings range is 9, which is the worst possible quality.

MP3 filters look for the dominant frequencies in the sound and concentrate on those. If your signal has noise, it might key on that noise and spend the bits where you don't want them. Another, more common case is that you present a lot of sonic energy in your voice above and below the audible spectrum. So, it's not really worth encoding it. Doug Kaye of IT Conversations recommends using low-pass and high-pass filters to optimize compression, with the low set to 10 kHz and the high to 80 Hz. This will allow the encoder to concentrate on the important sound.

Thankfully, LAME has low- and high-pass filters baked right in:

 % lame --lowpass 10 --highpass 0.08 mysound.wav mysound.mp3

Both the low- and high-pass filters have settings in kilohertz. So, we set the low-pass filter to 10 kHz and the high-pass filter to 0.08 kHz (or 80 Hz).

Another handy feature of LAME is its ability to set ID3 tags:

 % lame --tt "My Title" --ta "Me" --tl "My Album" mysound.wav mysound.mp3

In this case, we are setting the title to My Title, the name of the artist to Me, and the album name to My Album. LAME handles both versions of the ID3 tag format.

LAME has a lot more features than I covered here. For a complete list, use the --longhelp command-line argument:

 % lame --longhelp

You can change the sampling frequency and the number of channels. You also can apply gain factors, do variable bit rate encoding, and a whole lot more.

3.8.4. Combine Without Re-Encoding

Two other command-line tools will aid you in your podcasting. mpgtx is a set of MP3 file tools that allows you to cut and join MP3 files without re-encoding, and SoX is a command-line tool that does signal processing on files.

3.8.4.1 mpgtx.

Re-encoding MP3 files is a bad idea. MP3 is a lossy format, so each time you take an MP3 file, decode it, and then re-encode it, you lose quality. But encoding from an original file can take a whilea long while. So, what do you do when you want to be able to create several versions of a podcast, each with different ad placements [Hack #47]? Or suppose you provide a service, whereby you take several self-contained segments and string them together into one MP3 for the listener. Is there a way to do that on the fly with MP3?

Yes, you can split and join MP3 files in any combination of ways you like, without re-encoding. Several commercial software packages do it. But there is also the open source mpgtx (http://mpgtx.sf.net/) package. mpgtx is available for Windows, Mac OS X, and Linux.

To install mpgtx on Linux or Macintosh, first download the mpgtx gzipped tarball from the site. Then unpack it with the tar -xzvf command (or just let OS X do it for you). After that, run ./configure, followed by sudo make install to make and install the program.

On Windows, just download the binaries and install mpgtx wherever you keep your command-line utilities.

Once mpgtx is installed, you will have a host of new MP3 file commands at your fingertips. Three primary utilities are available: mpginfo, which gives you information about an individual MP3 file; mpgjoin, which joins multiple MP3 files into a single file; and tagmp3, which gives you control over the MP3 file's ID3 tags.

Here is the mpginfo commands report on the mysound.mp3 file I made earlier in the hack with LAME:

 % mpginfo mysound.mp3 mysound.mp3  Audio : Mpeg 1 layer 3  Estimated Duration: 00.63s  128 kbps 44100 Hz  Frame size: 417 bytes  Joint Stereo: (Intensity stereo off, M/S stereo on)  No emphasis, original  ID3 v1.1 tag ---------------- title : My Title artist: Me album : My Album track : 0 ----------------

I'll use three other MP3 filestop.mp3, interview.mp3, and bottom.mp3to demonstrate how to build one MP3 from the sum of three MP3s. The top.mp3 file is the introduction to the show, the interview.mp3 file is the interview in the middle, and the bottom.mp3 file is the credits and outtro [Hack #63]:

 % mpgjoin bottom.mp3 interview.mp3 top.mp3 -0 total.mp3 Now processing bottom.mp3 1/3 … 100.00% Now processing interview.mp3 2/3 … 100.00% Now processing top.mp3 3/3 … 100.00%

This built a file called total.mp3 from the sum of the three files. There are some limitations, but as long as the files are compatible in terms of sampling rate and bit rate, it will join them into the single, large file.

The third utility that is valuable to podcasters is the tagmp3 program, which can set the ID3 tags in an MP3 file from the command line:

 % tagmp3 set "%A:Me %a:My Album %t:My Song" total.mp3 Setting total.mp3 tag % mpginfo total.mp3 total.mp3   Audio : Mpeg 1 layer 3   Estimated Duration: 02.98s   64 kbps 44100 Hz   Frame size: 208 bytes   Mono, No emphasis, original   ID3 v1.0 tag ---------------- title : My Song artist: Me album : My Album ----------------

With the first command, I set the artist, album name, and song name on the total.mp3 file. Then I used the mpginfo application to make sure the ID3 tags were set properly.

The format items for the tagmp3 set commands are shown in Table 3-9.

Table The tagmp3 format syntax
Format item	ID3 tag field
`%A`	The artist name
`%a`	The album name
`%t`	The title of the song
`%T`	The number of the track
`%y`	The year
`%g`	The genre ID
`%c`	The comment

The mpgtx package contains three other utilities as well: mpgcat, which outputs the joined MP3 file to the standard output; mpgsplit, which extracts portions of the MP3 file into another MP3 file; and mpgdemux, which extracts a single MPEG file into multiple files with its component pieces.

3.8.4.2 SoX.

SoX is an open source application that is available on Windows, Mac OS X, and Linux. It has two primary functions: it converts file formats, and it can apply a variety of effects to the sound it's converting.

To install SoX, start by downloading the source from the SoX site (http://sox. sf.net/). Then use ./configure and sudo make install to build the SoX program and install it. On Windows just download the binary and install it wherever you like.

To convert a sound file, simply specify the input filename and output filename as command-line arguments:

 % sox mysound.wav mysound.mp3

In this case, the mysound.wav file is converted to mysound.mp3. There are far too many input and output audio formats to list here, but this certainly covers all the standard file formats.

Next, you can apply a series of filters to augment the conversion:

 % sox mysound.wav mysound.mp3 vol 0.5 pitch 300ms

This applies the "vol" filter, which tweaks the gain, in this case reducing the volume of the signal by 50%. Then it uses a pitch shift to up the signal to make it sound chirpy.

SoX supports a wide variety of effects: band pass filter, chorus, echo, fade, high pass, low pass, pan, phaser, pitch shifting, reverb, reverse, silence, stretch, and volume adjustments [Hack #58], as well as editing functions such as trimming away portions of unwanted signals.

I strongly recommend that if you are going to do a series of SoX operations, you use lossless formats such as .aif or .wav to do the operations. This will ensure that you don't lose data on each conversion.

3.8.5. Other Encoding Formats

Several alternative audio formats are worth knowing about. Here is a list of the lossless formats that will preserve audio, with no drop in sound quality:

WAV: The Microsoft lossless sound storage format. There is no compression. The format is linked intimately to the structure of Intel processors.
PCM: Short for pulse code modulated, this is the format that is used on CDs.
FLAC: The Free Lossless Audio Codec is a file format that provides between 30% and 70% compression, without loss of sound quality.
AIFF: This is the Audio Interchange File Format, which is commonly used on Macintoshes.

There are a few more lossy formats as well:

AAC: The Advanced Audio Coding file format is the next-generation MP3 file format, with better compression as well as support for more channels, higher sampling rates, and better sound performance in the higher frequencies. Apple is a strong proponent of this standard. The songs from the iTunes music store are AAC files.
AC-3: The Dolby compression standard used in Dolby Digital systems, this supports multiple channels to enable surround sound as well as variable bit rate encoding.
MP2: This is the predecessor to MP3 and is a higher-quality lossy format commonly used by the broadcast industry.
Ogg Vorbis: An open source lossy codec and file format that is gaining in popularity, this was developed in response to a licensing threat to MP3. Audacity supports Ogg Vorbis import and export without the addition of a codec.
WMA: The Windows version of MP3 for its Windows Media Player and some portable devices, this is a direct competitor to AAC. Both of these formats support digital rights management (DRM).

Because of its immense popularity, MP3 is universally supported on all operating systems, often with multiple players. But it's worth knowing about these formats so that you can recognize different formats as you see them.

3.8.6. See Also

"Choose the Right Audio Tools" [Hack #50]