Hack 3.8. Construct Your MP3s
Use command-line tools to encode your MP3 files as well as to build customized MP3 files on the fly. The MPEG-1 Audio Layer 3 format (MP3 for short) is a lossy compression format for audio. This means the sound going in will have elements removed to pack it into a smaller output size. The process of building an MP3 file from an input sound file is called encoding. The software that does this encoding is called an encoder. MP3 first became popular around 1995. The first encoders did a poor job, so if you wanted an MP3 that was indistinguishable from a CD, you needed a bit rate of 256. (The bit rate relates to the amount of data stored per second in the file. The more bits, the better the sound.) Modern encoders, such as the latest version of the free LAME encoder (http://lame.sf.net/), produce CD-quality sound at a bit rate of 192. Deciding what to remove from a recording to create the compressed file is fairly tricky. Modern encoders use a psychoacoustic model of our hearing to decide what to give the most bits to in the recording. Most of the time this will be the dominant frequencies in the segment being encoded. However, encoders are free to use whatever mechanism they think will produce the best sound for the given bit rate. It follows that for less complex signals, such as a single human voice, we could spend fewer bits and still get a quality sound. This is why spoken-word podcasts encoded at 64 bits or even 32 bits still sound reasonably well. If you have a more complex sound, such as that of a music show, you should probably use 128-bit encoding. 3.8.1. Variable Bit Rate EncodingAnother option is to use variable bit rate encoding, or VBR. An MP3 file is organized into frames of data. Each frame has a header that indicates the number of channels, the sampling rate, and the bit rate. The bit rates can be altered from frame to frame. This is called variable bit rate, and it allows for a podcast to vary between complex musical segments and simpler spoken-word segments while achieving optimal compression overall. The downside of variable bit rate encoding is that it's not guaranteed to be compatible on all players. I suggest you try VBR compression to see if it will make a big difference, and if it does, test it with a couple of modern clients to make sure it works. 3.8.2. ID3 TagsUnlike records and CDs, the MP3 file format includes some information identifying the file's content. The first version of this identification standard was called ID3 v1.0 [Hack #40] and it contained enough room to put in a song title, an artist name, an album name, a year, a short comment, and a genre identifier. Each field was given a set number of characters. Song name, artist, album, and comment were allowed a maximum of 30 characters. Year was given 4 characters, and the genre ID 1 byte. These are ASCII characters, so each is a single byte. Unicode support won't be available until Version 2.0. Version 1.1 of the standard cannibalized 2 bytes from the comment tag to provide space for a track number. The specification hardcoded 126 different genresfrom common ones such as Blues, Funk, Rock, and Reggae to more obscure Booty Bass, Primus, and Porn Groove. Podcasts tend to fit conveniently into genre #12, or Other. The second version of the ID3 standard bears little resemblance to the first version. Instead of a single, small structure with a few fields, this version features a much more flexible tagged format. The format can store Unicode text for all the fields, which allows for internationalization. It can also store media such as pictures, large blocks of text such as original lyrics, and handy things for DJs such as beats per minute (BPM). You can find more information on all of these standards at the ID3 site (http://id3.org/). 3.8.3. Encoding with LAMELAME, free software that includes encoding functionality, is available on the LAME site (http://lame.sf.net/). It's an outstanding encoder that you can plug into your programs and use as a DLL or a library. You can also use it directly from the command line. To encode a file from the command line you need to give it the name of the input file and the name of the MP3 file to create: % lame mysound.wav mysound.mp3 LAME version 3.96.1 (http://lame.sourceforge.net/) Using polyphase lowpass filter, transition band: 17249 Hz - 17782 Hz Encoding hello.wav to hello.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=3 Frame |CPU time estim |REAL time/estim | play/CPU | ETA 20/23(87%)| 0:00/ 0:00 | 0:01/ 0:01| 5.2245x| 0:00 average: 128.0 kbps MS: 23 (100.0%) Writing LAME Tag…done ReplayGain: -14.6dB Here the mysound.wav file is being encoded as MP3 in the new mysound.mp3 file. The encoder is using the 128 kbps bit rate (the default). To change the bit rate of the encoding, add theb option: % lame -b 32 mysound.wav mysound.mp3 That command line encodes the same sound as before, but at the significantly poorer 32 kbps bit rate. Bit rate is the main quality setting in MP3. The larger the bit rate setting, the better the quality. But LAME also provides another quality switch, -q, for setting the Q value: % lame q 0 mysound.wav mysound.mp3 Setting the Q value to 0 requests that the best-quality compression be done. The other side of the settings range is 9, which is the worst possible quality. MP3 filters look for the dominant frequencies in the sound and concentrate on those. If your signal has noise, it might key on that noise and spend the bits where you don't want them. Another, more common case is that you present a lot of sonic energy in your voice above and below the audible spectrum. So, it's not really worth encoding it. Doug Kaye of IT Conversations recommends using low-pass and high-pass filters to optimize compression, with the low set to 10 kHz and the high to 80 Hz. This will allow the encoder to concentrate on the important sound. Thankfully, LAME has low- and high-pass filters baked right in: % lame --lowpass 10 --highpass 0.08 mysound.wav mysound.mp3 Both the low- and high-pass filters have settings in kilohertz. So, we set the low-pass filter to 10 kHz and the high-pass filter to 0.08 kHz (or 80 Hz). Another handy feature of LAME is its ability to set ID3 tags: % lame --tt "My Title" --ta "Me" --tl "My Album" mysound.wav mysound.mp3 In this case, we are setting the title to My Title, the name of the artist to Me, and the album name to My Album. LAME handles both versions of the ID3 tag format. LAME has a lot more features than I covered here. For a complete list, use the --longhelp command-line argument: % lame --longhelp You can change the sampling frequency and the number of channels. You also can apply gain factors, do variable bit rate encoding, and a whole lot more. 3.8.4. Combine Without Re-EncodingTwo other command-line tools will aid you in your podcasting. mpgtx is a set of MP3 file tools that allows you to cut and join MP3 files without re-encoding, and SoX is a command-line tool that does signal processing on files. 3.8.4.1 mpgtx.Re-encoding MP3 files is a bad idea. MP3 is a lossy format, so each time you take an MP3 file, decode it, and then re-encode it, you lose quality. But encoding from an original file can take a whilea long while. So, what do you do when you want to be able to create several versions of a podcast, each with different ad placements [Hack #47]? Or suppose you provide a service, whereby you take several self-contained segments and string them together into one MP3 for the listener. Is there a way to do that on the fly with MP3? Yes, you can split and join MP3 files in any combination of ways you like, without re-encoding. Several commercial software packages do it. But there is also the open source mpgtx (http://mpgtx.sf.net/) package. mpgtx is available for Windows, Mac OS X, and Linux. To install mpgtx on Linux or Macintosh, first download the mpgtx gzipped tarball from the site. Then unpack it with the tar -xzvf command (or just let OS X do it for you). After that, run ./configure, followed by sudo make install to make and install the program. On Windows, just download the binaries and install mpgtx wherever you keep your command-line utilities. Once mpgtx is installed, you will have a host of new MP3 file commands at your fingertips. Three primary utilities are available: mpginfo, which gives you information about an individual MP3 file; mpgjoin, which joins multiple MP3 files into a single file; and tagmp3, which gives you control over the MP3 file's ID3 tags. Here is the mpginfo commands report on the mysound.mp3 file I made earlier in the hack with LAME: % mpginfo mysound.mp3 mysound.mp3 Audio : Mpeg 1 layer 3 Estimated Duration: 00.63s 128 kbps 44100 Hz Frame size: 417 bytes Joint Stereo: (Intensity stereo off, M/S stereo on) No emphasis, original ID3 v1.1 tag ---------------- title : My Title artist: Me album : My Album track : 0 ---------------- I'll use three other MP3 filestop.mp3, interview.mp3, and bottom.mp3to demonstrate how to build one MP3 from the sum of three MP3s. The top.mp3 file is the introduction to the show, the interview.mp3 file is the interview in the middle, and the bottom.mp3 file is the credits and outtro [Hack #63]: % mpgjoin bottom.mp3 interview.mp3 top.mp3 -0 total.mp3 Now processing bottom.mp3 1/3 … 100.00% Now processing interview.mp3 2/3 … 100.00% Now processing top.mp3 3/3 … 100.00% This built a file called total.mp3 from the sum of the three files. There are some limitations, but as long as the files are compatible in terms of sampling rate and bit rate, it will join them into the single, large file. The third utility that is valuable to podcasters is the tagmp3 program, which can set the ID3 tags in an MP3 file from the command line: % tagmp3 set "%A:Me %a:My Album %t:My Song" total.mp3 Setting total.mp3 tag % mpginfo total.mp3 total.mp3 Audio : Mpeg 1 layer 3 Estimated Duration: 02.98s 64 kbps 44100 Hz Frame size: 208 bytes Mono, No emphasis, original ID3 v1.0 tag ---------------- title : My Song artist: Me album : My Album ---------------- With the first command, I set the artist, album name, and song name on the total.mp3 file. Then I used the mpginfo application to make sure the ID3 tags were set properly. The format items for the tagmp3 set commands are shown in Table 3-9.
The mpgtx package contains three other utilities as well: mpgcat, which outputs the joined MP3 file to the standard output; mpgsplit, which extracts portions of the MP3 file into another MP3 file; and mpgdemux, which extracts a single MPEG file into multiple files with its component pieces. 3.8.4.2 SoX.SoX is an open source application that is available on Windows, Mac OS X, and Linux. It has two primary functions: it converts file formats, and it can apply a variety of effects to the sound it's converting. To install SoX, start by downloading the source from the SoX site (http://sox. sf.net/). Then use ./configure and sudo make install to build the SoX program and install it. On Windows just download the binary and install it wherever you like. To convert a sound file, simply specify the input filename and output filename as command-line arguments: % sox mysound.wav mysound.mp3 In this case, the mysound.wav file is converted to mysound.mp3. There are far too many input and output audio formats to list here, but this certainly covers all the standard file formats. Next, you can apply a series of filters to augment the conversion: % sox mysound.wav mysound.mp3 vol 0.5 pitch 300ms This applies the "vol" filter, which tweaks the gain, in this case reducing the volume of the signal by 50%. Then it uses a pitch shift to up the signal to make it sound chirpy. SoX supports a wide variety of effects: band pass filter, chorus, echo, fade, high pass, low pass, pan, phaser, pitch shifting, reverb, reverse, silence, stretch, and volume adjustments [Hack #58], as well as editing functions such as trimming away portions of unwanted signals. I strongly recommend that if you are going to do a series of SoX operations, you use lossless formats such as .aif or .wav to do the operations. This will ensure that you don't lose data on each conversion. 3.8.5. Other Encoding FormatsSeveral alternative audio formats are worth knowing about. Here is a list of the lossless formats that will preserve audio, with no drop in sound quality:
There are a few more lossy formats as well:
Because of its immense popularity, MP3 is universally supported on all operating systems, often with multiple players. But it's worth knowing about these formats so that you can recognize different formats as you see them. 3.8.6. See Also
|