The phrasing on the right
both rules; there's poor phrasing
Caption 1 (breaking up a
Captions 1 and 2 (breaking up a title). If you read both versions out loud, you'll instantly see that the first column reads much more naturally. "If it sounds like good phrasing, it probably is good phrasing" is the general approach, but for some truly definitive rules, go to page 10 of the Captioned Media Program document I referenced earlier.
Column 2 (right) violates another rule of segmentation: a period should always end a caption (though not all captions have to end with periods). Specifically, in Caption 2, where the first sentence ends with "Magazine," the next
should start a new caption, as it does on the left.
Here's what we've learned from this section (with one additional point):
Segment multiple lines within a caption into logical phrases.
Segment multiple lines of captions into logical phrases.
The end of a sentence ends a caption line.
Start a new caption each time the speaker changes.
Now that you have your text divided into captions, it's time to decide how to present the text.
Step 4: Choose Your Font and Case
Typically, when it comes to print or static (onscreen) text, fonts with serifs, such as Times New Roman, are more readable than sans serif fonts, and words are more recognizable, since most books and magazines use fonts with serifs. The Media Access Group recommends using the Roman font, and Times New Roman is the most similar font installed on most computers.
However, some research indicates that sans serif fonts work better for closed captions than fonts with a serif (there's more information on this at
). According to Gallaudet officials, in their experience sans serif fonts are more readable. The Captioned Media Program appears to share this view, as it chose Helvetica Medium, a sans serif font, as its standard. All in all then, sans serif fonts are probably the best choice.
As discussed in Chapter 5, text with mixed capitals and lowercase lettering is easier to read than all uppercase text, and therefore the recommended practice for streaming media and DVDs. If you think that recommendation
from most television captions, you're correct, and here's why.
Most closed-caption decoders on TV sets can't display the below-the-line segments of
such as j, g, q, and y (also called descenders). Instead, they display the entire letter above the line, producing a distracting appearance that decreases legibility. That's why television uses all caps. Streaming technologies and DVDs don't have these limitations, so you're free to use the more readable mixed-case lettering.
Step 5: Choose Your Font
Font sizes vary by captioning program, making it
to recommend a specific font size. In general, larger fonts are obviously more readable, but if your font is too large, your caption will wrap to the next line, or extend outside the viewing area.
There are also stylistic elements to consider. For example, PBS programs tend to use very small but elegant captions that torture my 40-something eyes (for example, see
). Those shown on the Web site of the National Center for Accessible Media (part of the Media Access Group) are much larger and much more readable (see, for example,
My recommendation is to prioritize readability over
. In deciding what font size to use, preview your options within your target player. MAGpie was a fantastic tool, but the appearance of the font size within its preview window didn't always accurately represent what ultimately appeared in the player.
Step 6: Define Text Placement and Speaker Identification
Here we'll discuss where to place your captions within (or underneath) the screen, and how and when you announce your speaker. As you would suspect, placement of text, in certain situations, can provide strong clues as to who is speaking. For this reason, in Figure 11.1, the text is positioned on the right, underneath the interviewer, Ken Santucci.
This leads us to Rule Number One in caption placement; if there are two consistently placed
, place captions
. Note that under television rules, both captions, irrespective of placement, would be left-justified. However, since many streaming formats can't display left-justified text on the right side of the screen, you should right-justify text placed on the right, and left-justify text placed on the left.
If there's only one speaker, place the caption in the center of the screen and center-justify the text. In addition, if the speaker is
, include the name or identification of the speaker, place the caption in the screen center and center-justify the text.
You should also identify the speaker whenever the viewer has no clear visual clues as to their identity. For example, if the video starts up and an off-screen narrator begins to speak, you should identify the speaker as narrator. If your interview has a J-cut, where the audio from the second video starts playing while the first video remains onscreen (see Chapter 5), you should identify the speaker.
If there are multiple on-screen speakers in a fast-paced discussion, consider identifying the speaker in all captions. Alternatively, since most speakers talk for longer than one or two captions, consider identifying the speaker only when the speaker changes.
As speaker identification is not spoken information, typically it's set off from the main captioning in some way. For example, in Figure 11.1, the speaker identification is positioned on its own line, in all caps, placed in brackets, and set off with a colon, which is the practice of the Media Access Group. The Captioned Media Program uses italics or brackets, with no
, and also positions the title on its own line.
In contrast, PBS, in its closed-captioned streaming
, uses all caps offset with a colon, on the same line as the first line in the caption, which looks like this:
KEN: Joining us here
is Mr. Jan Ozer.
Gallaudet's recommendation is prescient in this regard. "If the character cannot be identified by name, then a descriptor should be provided," he states. "An acceptable format for explicit identification is the character's name or descriptor in upper/lower case,
by parentheses, above the caption and left-justified with the caption. Other formats are probably uncontroversial." Basically, pick one approach, and apply it consistently.
Let's break these rules down for easy scanning:
If there are two consistently positioned speakers—
Place captions on their respective sides of the screen, justifying to their respective sides.
If there is only one speaker—
Place captions in the center of the screen and center-justify them.
If the speaker is
Place captions in the center of the screen and center-justify them. Some producers identify off-screen speakers with italics.
Clearly identify new speakers whenever speaker identification is not obvious to the viewer—
This can occur with off-screen narration, during J-cuts, or when there are many speakers on screen. Format your speaker identification to distinguish it from spoken text.
There's one real-world caveat to these rules: not all players and/or closed-captioned tools can create or implement left-justified, right-justified, and centered captions. For example, because of alignment problems
when playing closed-captioned streams in Windows Media Player, the Media Access Group modified MAGpie to produce only left-placed captions. In addition, RealPlayer can only display left and center-aligned captions (though, of course, you could right-justify the text using space or tab commands). In fact, the only streaming player that properly implemented our speaker-placement strategy was QuickTime.
Positioning within a DVD stream was a little more straightforward, and should be
in most authoring programs. Still, before selecting a caption-positioning strategy, test to ensure that all development tools and/or players
with the strategy.
Step 7: Define Rules for Noises and Other Points of Emphasis
As we've discussed already, closed captions must describe a broad range of audio events to enhance the viewer's
of the video. As with speaker identifications, these audio events need to be visually different from the spoken information.
The Media Access Group recommends showing sound-effect captions parenthetically, in lowercase italics (but don't italicize the parentheses), typically presented as a standalone caption. In the context of our interview footage, which was shot during the hustle and bustle of a trade show, captions included the one shown in
, displayed as the video is fading in from black at the start. This lets the viewer know that we're shooting in a
, and you should identify both the source of the noise and the noise itself.
Figure 11.2. Captioning audio events.
You can use these same indicators to describe the intonations that flavor the speech. In the interview, Ken and I were swapping stories, and he recalled a joint presentation where the equipment setup went less than smoothly. I laughed, and commented, "What a mess that was!" This would be captioned as shown in
. It's also appropriate to caption emotion (e.g.,
frown, deep in thought,
) even if there is no
Figure 11.3. Captioning the speaker's
The styles shown in Figures 11.2 and 11.3 are from the Media Access Group. The Captioned Media Program recommends brackets instead of parenthesis, and places on-screen noises and intonations in normal case, and off-screen noises in italics.
It's acceptable to use onomatopoeia, or text strings that sound like the noise being described, though Gallaudet University found that most consumers preferred both a text description and onomatopoeia.
These would appear as follows:
(putt drops into cup)
In addition to noises and sound effects, consider identifying other information that's apparent in the audio but not in the text description. This would include accents (e.g., French
), audience reaction (laughing, loud boos) and the pace of speech (slow drawl).
Step 8: Choose Your Music Treatment
Music often sets the mood of the video, so when background music is present, it should be indicated. Television sets use a special musical note character to identify music playing, or when someone is
, but the character is not
recognized by all streaming media players. If it's not available to you, use the word music in italics surrounded by either parenthesis (Media Access Group) or brackets (Captioned Media Program).
If the music has no lyrics, be as descriptive as possible (soothing music, disco music) and identify the name and the composer if known. Caption the lyrics if they are being sung, starting and ending with the special music character.
Step 9: Editing the Text
The goal with captions is to present them with the actual spoken word, but some people talk faster than others can read. In these instances, it's accepted practice to edit the text to achieve a certain reading speed.
In this regard, the Captioned Media Program guide
some very interesting statistics about reading rates along with very clear guidelines. Specifically, the guide states that most elementary or secondary students can read at 120 words per minute (wpm), and adults up to 160 wpm. For Captioned Media Program videos, the guide requires that "no caption should remain onscreen less than 2 seconds or exceed 235 wpm."
When editing the text, the Media Access Group advises that caption producers "try to maintain precisely the original meaning and flavor of the language as well as the personality of the speaker. Avoid editing only a single word from a sentence as this doesn't really extend reading time. Similarly, avoid substituting one longer word for two shorter words (even write a shorter word for a longer word) or simply making a contraction from two words (e.g., contracting 'should not' to 'shouldn't')."
Note that virtually all style guides recommend
modifying for correct English (substituting "isn't" for "ain't," or "you all" for "y'all"). Finally, if you find yourself having to shorten major sections of speech to meet your desired wpm rate, page 14 of the Captioned Media Program style guide offers some great guidelines.
Step 10: Other Issues
The first nine steps covered the main issues, but there are many additional standards to address. Two of the most common include:
Generally spell out one through ten, numerals for higher numbers except when they start a sentence (Media Access Group).
Display as normal (IEEE rather then eye-triple e)
For others, such as fractions, dates, dollar amounts, and more,
the Captioned Media Program style guide.