Hack 45. Convert a Closed Caption File to a Script

 < Day Day Up > 

If you've closed captioned your project, you can use a small amount of Perl code to extract a script for others to read.

In the business of video entertainment, if you sell a project to someone, she will probably request a script with timing information, also known as an As Broadcast Script in television. However, with a documentary or reality-style program, you might not have a script to give them.

Creating an As Broadcast Script is a time-consuming and tedious process. It involves listening to a section of audio and typing exactly what is heard. Often, a transcriber has to listen to a section of audio three or more times in order to accurately capture what is being said.

However, a closed caption file is regularly created in order to fulfill government requirements. With this hack, you can use the information in that closed caption file to create a script after the fact.

4.8.1. Getting Motivated

Even if you've never programmed a computer before, you might someday find yourself motivated enough to type a few lines of Perl, a computer programming language. If faced with transcribing for hours on end, tediously listening to audio over and over, or spending 5 minutes typing in a program and another 20 minutes formatting its results, I'll bet 9 times out of 10, you'll choose the program. In fact, once you've used the program once, you might become addicted to using it.

The following Perl program was written for a television production company that had 140 hours of video that needed As Broadcast Scripts within two weeks. A rough estimate is that it takes 5 hours for a transcriber to complete one hour of video. Therefore, there was roughly 560 hours of work to be completed.


4.8.2. Looking at a Closed Caption File

The process of closed captioning involves inserting codes onto line 21 of a video signal. If that doesn't make sense, think about how NTSC DV is 720 x 480, which means the video signal is 720 lines wide by 480 lines high. Closed caption data would be placed on line 21 of 480.

NTSC broadcast video contains 525 lines of horizontal resolution.


Just like most computer documents, a closed caption file can be opened in a regular text editor. The results, however, aren't pretty. Here's an excerpt from an actual closed caption file (.tds file extension):

 ù01000013 úFû14,û14)û13tû17"û11.Coming up:û14tû17"Whoa, wait. ù01000112 úFû14 û14.û14tû17!û11.That's got meû14,û14/ ù01000209 úFû14 û14rû17#a little nervous.û14,û14/ ù01000316 úFû14 û14RThese guys are not likeû14rû17!my friends back home.û14,û14/ 

Looking at the contents, you can see where there are sentences being spoken. In this example, you can clearly see the words Coming up. The rest of it is messy.

But if you look closely, you might discover that there is a pattern: ù(some number), then a sentence, then another ù(some number), and so on. The number following the ù is eight characters long. Coincidentally, so is timecode.

So, you can read ù01005108 as 01:00:51;08.

4.8.3. The Code

Perl works well with text. It can be intimidating to look at and difficult to read, but it can perform wonderful tasks and save a lot of time when used. If you are using Mac OS X or Linux, Perl is most likely already installed on your computer. If you are using Windows, you can download Perl from Active State (http://www.activestate.com/Products/ActivePerl/; free).

The following is a Perl script to reformat .tds closed caption files:

 #!/usr/bin/perl # for Caption Center (.tds) files while (<>) { # remove everything before and including BeginData $_ =~ s/.*BeginData//g; # remove all of the >> $_ =~ s/>> //g; # remove all of the úF $_ =~ s/\x9cF//g; # remove all of the û followed by 3 characters $_ =~ s/\x9e(…)/ /g; # reformat the timecodes… i.e. ù01005108 to 01:00:51;08 $_ =~ s/\x9d([0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9]) /\r\1:\2:\3\;\4\t\t/g; # replace all of the places where there is a tab-tab-return with a single tab $_ =~ s/\t\t\r/\t/g; # print everything back out print $_; } 

That's it. That's the entire application. Fin. Done. Out.

Save the file to your computer and name it CCConverter.pl or whatever you want; it's your application after all.

4.8.4. Running the Hack

Once you've written your Perl application, all you have to do is run your closed caption files through it. To do so, enter the following on the command line:

 perl CCConverter.pl <MyScriptCC.tds>MyScript.txt 

To run a command line program on:

  • Mac OS X: open the Terminal application, located in the Utilities folder.

  • Windows: open the Run… item from the Start menu, then type command.

When running the Perl application, it will be easiest to do so from the same directory where the application resides.


You'll want to enter the name of your closed caption file for MyScriptCC.tds and then name the resulting file as MyScript.txt. The < indicates to Perl which file you want to read and the > indicates which file you want to write.

4.8.5. The Results

Here is the original excerpt, as converted by the Perl program:

01:00:00;13 Coming up:  Whoa, wait. 01:00:01;12 That's got me 01:00:02;09 a little nervous. 01:00:03;16 These guys are not like  my friends back home. 

So, with the three or more hours, you save per script, you should find plenty of time to explore more hacks, spend time with your family, or maybe even get outdoors and breathe some fresh air.

     < Day Day Up > 


    Digital Video Hacks
    Digital Video Hacks: Tips & Tools for Shooting, Editing, and Sharing (OReillys Hacks Series)
    ISBN: 0596009461
    EAN: 2147483647
    Year: 2005
    Pages: 158
    Authors: Joshua Paul

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net