You have your style guide; now you're ready to begin formatting your text. The first step is to convert the audio into a text file. You can tackle this problem in a number of ways.
First let's deal with manual conversion. Here a transcriber listens to the audio and enters all speech and other audio information into a file in a word processor. Then the transcriber starts breaking the file into individual captions according to standards used by the closed-captioning program. For example, MAGpie has very specific requirements for text input.
Manual conversion can be very
. Some sources estimate that television programs and movies rich in nonspeech audio content such as sound effects, background music, and drama, can take up to 20 hours of conversion for each
of audio. Most corporate or academic training materials should be much shorter, since most of the text is simply transcribed speech. If you have a script that was largely followed, this provides a good starting point.
A quick Google search under "closed captioning services" revealed a number of service bureaus with prices starting at the $6 per minute range, or less than $200 for a 30-minute production. While
not cheap, this is a fraction of most production costs,
if you had to rent equipment, a soundstage, or pay actors or other
If you go the service
route, when obtaining a quote, be sure to ask the following questions:
What digital and analog captioning formats does the bureau support (Windows Media, QuickTime, Real, Line 21, DVD)?
What level of accuracy will the service bureau guarantee?
What style of captions will the service bureau produce (roll-up, paint-on, pop-on)?
How will the service bureau segment the text (
per line and lines per caption)?
Will charges include complete audio transcription (sound effects, intonations) or just speech?
Which style guide or other direction will the service bureau use to segment the text and caption information, such as dates,
, and the like?
Automatic speech recognition
works best when
to the voice and speech pattern of one
. For this reason, plugging in the audio feed from a training video or lecture involving random individuals will almost always produce poor-quality results.
To avoid this problem, many universities (including Gallaudet's Television and Media Production Service) have adopted "shadowing" or "voice writing" where a person trained on the software repeats every word spoken in the video into the voice recognition system. Typically, these individuals work in a quiet environment or use a mask to minimize outside interference. These systems are not 100 percent accurate, but they do provide a first draft that can accelerate the transcriber's work.
Computer Prompting and Captioning Company (CPC), a prominent closed captioned vendor, sells several systems that include IBM ViaVoice speech recognition software (
). You can also read about Gallaudet's experience with shadowing here:
Converting Broadcast Closed Captions
Most current television programs include closed captioning, which you can capture and
for use in streaming content. Grabbing the closed-captioned text itself can be
inexpensive, as most All-In-Wonder products from ATI have
this feature for
). Of course, from there, you have to reformat the text as necessary for your ultimate use.
To streamline this process, in fall 2004, The Media Access Group's research arm, NCAM (National Center for Accessible Media), announced CaptionKeeper, which captures broadcast captions (also called Line 21 captions) and converts them to Real or Windows Media captions ready for live streaming or archiving.
The fee to license CaptionKeeper is $1,000 for corporations and $500 for academic institutions. Third-party hardware is required to capture the Line 21 captions, which will probably add about $800 to the entire system price. Check NCAM's Web site for more details (
). Note that while the Media Access Group
to provide details about these products on its Web site, there was nothing posted at press time.