Enhance User Experience with Audio Description


Maximum Accessibility: Making Your Web Site More Usable for Everyone
By John M. Slatin,, Sharron Rush
Table of Contents
Chapter 13.  Enhancing Accessibility through Multimedia

Synchronized audio description a spoken description of the activities presented through video is another essential element in making video accessible, in this case for people who are unable to see the video, whether because they are blind or for some other reason. The audio description is recorded on a separate track, which is usually inserted into natural pauses in the soundtrack. It describes key visual elements of the presentation. The audio description may include information that describes the scene, the characters, their body language, and any other visual elements that may be important to understanding the video but are not available from the soundtrack alone. Like captions, audio descriptions must be synchronized with the video stream they describe.

Like closed captioning and other accessibility techniques, audio description existed before the Web was born. But it's not all that old, either. Audio description of live theater events began in 1981 with the pioneering work of Margaret and Cody Pfanstiehl of the Metropolitan Washington Ear, along with Arena Stage in Washington, DC. Now audio description can be found in several other states as well as the District of Columbia. In 1987, the Boston PBS affiliate, WGBH, which had pioneered closed captioning in the early 1970s, launched its Descriptive Video Service (DVS). In 1989, Congress for the first time appropriated funds for described television, as authorized by the Education of the Handicapped Act. By 1997, WGBH had established a policy that all national programming would be described. [4] The Na-tional Academy of Television Arts and Sciences awarded Margaret Pfanstiehl an Emmy for "leadership and persistence" in pioneering accessible TV for people who are visually impaired, and the DVS has received many national and international awards as well.

[4] "DVS Milestones." Accessed January 19, 2002, at http://main.wgbh.org/wgbh/access/dvs/dvsmilestones.html.

Why Describe?

Why bother describing video content? After all, it seems logical to assume that the 6.5 million people in the U.S. who are blind or severely visually impaired don't watch much video. But that's not so. In 1997, the American Foundation for the Blind (AFB) published "Who's Watching? A Profile of the Blind and Visually Impaired Audience for Television and Video" [Packer and Kirchner 1997]. The study found that 99 percent of blind and visually impaired respondents owned television sets, and 83 percent owned VCRs; these figures are virtually the same as those for the general population. The study also found that more blind and visually impaired people were cable subscribers. Re-spondents watched a mean of 24 hours of television per week. In these and most other respects, the AFB researchers found, the statistics about people who are blind and visually impaired were very similar to those for the U.S. population as a whole. The AFB report also indicates that 96 percent of persons with no usable vision state that description is very important to their enjoyment of television or videos. More than 75 percent of all respondents claim that the benefits of description include enhancing the overall viewing experience as well as the learning and social experiences of television and video [Packer and Kirchner 1997].

Audio description doesn't just make TV and video more fun for people who have trouble seeing. Audio description, like other accessibility techniques we have presented, has wider benefits as well. An earlier study conducted by the AFB with support from the National

Science Foundation concluded that "description has positive impacts in psychological, social, and cognitive domains on blind and severely visually impaired individuals." [5]

[5] "DVS Milestones." Accessed January 19, 2002, at http://main.wgbh.org/wgbh/access/dvs/dvsmilestones.html.

Differences between Audio Descriptions and Closed Captions

There are some important differences between audio descriptions and closed captions. First, of course, they serve different needs. Closed captions help people who can't hear the soundtrack or may have difficulty understanding it. Audio descriptions help people who can't see the visual material or have trouble understanding it. This means that making your video material accessible isn't a matter of choosing between closed captioning and audio description: WCAG 1.0 and Section 508 require both techniques.

That brings us to a second difference. As the Caption Center guidelines discussed earlier make clear, closed captions are verbatim or nearly verbatim transcripts of exactly what's spoken in the video soundtrack (including interruptions) and by whom. Captions should also include notations for other significant sound elements: music, laughter, applause, gunshots, cars backfiring, the murmur of voices in the background in a cocktail party scene, and so on. Audio descriptions, on the other hand, can't possibly include everything going on in the scene. Audio descriptions usually last only a couple of seconds they have to fit into the pauses in dialog and other spaces where they won't interfere with the actual soundtrack. SMIL version 2.0, published in February 2001, contains features that let developers "expand" those pauses, "freezing" the action and dropping in longer descriptions; but these features aren't yet well supported and we won't address them here. Under the current circumstances, the describer has to select what to describe.

A number of organizations offer training in audio description. VSA Arts of Texas (formerly Austin Access Arts), for example, conducts workshops at conferences throughout Texas and across the Southwest and offers an excellent series of videotapes and accompanying workbooks for those who want to learn the techniques of describing a wide variety of performances and other events. As of this writing (June 2002), however, there are no standards for audio description comparable to the WCAG. That may change very soon: the first conference of a new organization, Audio Description International, was held in March 2002. The agenda included consideration of the questions involved in developing official international standards for audio description, as well as training and certification programs. The proceedings from the conference are available online at http://www.adinternational.org/conference/2002/.

Deciding What to Describe

In the meantime, though, you have decisions to make about those video clips you plan to include on your site. How do you decide what to describe? It helps to keep in mind that audio description is aimed at people who can hear the soundtrack but can't see what's happening on the screen. So you should focus on describing only critical details that can't be deduced from the soundtrack.

Note, too, that we keep using words related to description. This isn't an accident: an audio description should be as neutral as possible with respect to what it describes. Instead of expressing your own value judgments or ideas about why things are happening on the screen, your job as describer is to present the scene in such a way that people who can't see it can form their own conclusions about what's going on, just as people who can see the video will draw their own conclusions.

Joe Clark published a proposed list of five principles (he calls them "techniques," but they're not) for audio description in advance of his presentation at the first Audio Description International conference in March 2002. Although these are somewhat controversial they are considerably less detailed, for example, than those published by the United Kingdom's Independent Television Council in the year 2000 [6] they may be useful for people just getting started, so we offer them here.

[6] The Independent Television Council's "Guidance on Standards for Audio Description" is available in Microsoft Word format at http://www.itc.org.uk/divisions/eng_div/subtitle/Audio_Description.doc. This document includes a good deal of information that is quite specific to British television.

  1. Describe what you observe.

  2. Describers and narrators serve the audience and the production, not themselves.

  3. If time limits force you to be selective, first describe what is essential to know, such as actions and details that would confuse or mislead the audience if omitted.

  4. Whenever possible, describe actions and details that add to the understanding of personal appearance, setting, atmosphere, and mise-en-scène [loosely translated, the mise-en-scène is the setting, or the placement of people and objects in the scene, as on a movie set].

  5. Since it is more important to make a production understandable than to preserve every detail of the original soundtrack, it is permissible to describe over dialog and other audio when necessary. [Clark 2001]

Now that we have at least a basic understanding of audio description, let's move on to an example of audio description in action.

An Example of Audio Description: The TX2K Video

In 2001, the Institute for Technology and Learning (ITAL) commissioned a short video about an educational Web project called TX2K: The Texas 2000 Living Museum. [7] The K-12 students who participate in the TX2K project research and create multimedia "exhibits" about the past, present, and future of each of their own communities. Students publish their completed exhibits in the TX2K museum gallery for viewing by their peers and anyone else who might be interested (Figure 13-6).

[7] The TX2K video was produced by Linda Litowski and Henry Miller of L&M Pro Video in Austin, Texas.

Figure 13-6. Screen shot of the TX2K Gallery, where student exhibits may be viewed. In the center of this virtual room's "rear wall," the words "Museum Gallery" are superimposed on an image of the Texas flag above the Texas map. Accessed January 19, 2002, at http://www.ital.utexas.edu. Used with permission.


The two-and-a-half-minute video was designed for inclusion on a CD-ROM that would be distributed to schools across Texas. The CD provided a comprehensive overview of the project. It explained the activities and resources included in TX2K, outlined the state curriculum requirements addressed by the project, and described the accessibility features that had earned TX2K first place for Extraordinary Web Design in a national competition sponsored by Project EASI (Equal Access to Software and Information). The video itself was intended as an introduction to the CD. It was also meant to demonstrate accessible video. Both closed captions and audio de-scriptions are available. (Since we covered closed captioning extensively in our discussion of the ATSTAR video, we concentrate on audio description here.)

The Video

Scenes showing intensely engaged children and teachers in classrooms and computer labs were intercut with screen shots from the TX2K site and "talking head" shots of educators explaining why they liked the project. One outdoor scene was shot at an abandoned rural cemetery where children were researching a yellow-fever epidemic that devastated the area shortly after the Civil War.

The Descriptions

In the soundtrack for most of the classroom scenes, the voice of a teacher is heard above the buzz of children's voices and background music. In one such scene, a teacher, pointing to a spot on the Web page that a fourth-grader had found, tells the child about an important figure in Texas history who is named on the page. When the scene opens, though, the soundtrack contains only background music: there is no way for someone who cannot see the screen to tell what is on the screen. This is where the describer comes in, with a quick remark to set the scene: "Boys and girls in a computer classroom." At another point, the camera cuts away to a screen shot; the audio describer notes simply, "TX2K login screen." Still later, the scene shifts to the abandoned cemetery where two children are looking at an old tombstone. The soundtrack offers only background music, and again the audio description kicks in to tell us where we are and what's happening. "In an historic cemetery, a girl kneels to trace the letters on an old tombstone," the describer says. "Another girl uses a digital camera to photograph another tombstone."

The Describer

The describer was Connie McMillan, an experienced volunteer trained in audio description by VSA Arts of Texas. McMillan accompanied ITAL staff to the L&M studio on a rainy February day. The group of us watched the video several times and discussed the essential points the video was meant to convey and the emotional charge we wanted to create. Then we went through the video again, this time pausing the video where descriptions were needed; McMillan tried out different variations until everyone was satisfied. This actually turned out to be the next-to-last version: producer Linda Litowski had transcribed it, and when she read back the individual segments, we found ourselves making some slight changes along the way. Then McMillan read each item while Emmy Award-winning soundman Henry Miller captured her voice.

If the process we've described seems cumbersome, consider this: the best way to reduce the time needed for creating effective audio descriptions is to incorporate them into the scripting process for the video. Review the shot list and the captions to determine where descriptions are necessary; then write a description script and record someone reading the descriptions. Then, at the appropriate point in the production process, synchronize the captions and audio descriptions with the main video element, using MAGpie or another tool designed for the purpose. This works best, of course, when you have full control over the video script; but it can also be adapted when it's necessary to produce descriptions after the fact, when you're "repairing" older video that lacks captions and descriptions.

The SMIL File

As with the ATSTAR video discussed earlier, the SMIL file handles the synchronization of the video, closed captions, and audio descriptions. The code below gives an example.

<smil>  <head>  <meta name="title" content="The Texas 2000 Living Museum"/>     <layout system-captions="on">  <root-layout background-color="black" height="315"     width="357"/>  <region  background-color="black" top="5"     left="5" height="240" width="352"/>  <region  background-color="#000000"     top="255"     left="5" height="60" width="352"/>  </layout>  </head>  <body>  <par>  <!-- VIDEO -->  <video src="/books/3/135/1/html/2/open_mv.mpg" system-captions="off"/>  <video src="/books/3/135/1/html/2/open_mv.mpg" region="videoregion"     system-captions="on"/>  <!-- CAPTIONS -->  <switch>     <textstream src="/books/3/135/1/html/2/open_cc.avi" region="textregion"        system-language="en" system-overdub-or-caption           ="caption"        system-captions="on" title="TX2K Museum Captions"           alt="TX2K        Museum Captions"/>  <!-- AUDIO DESCRIPTION -->     <audio src="/books/3/135/1/html/2/open_ad5.mpg" system-language="en"        system-captions="on"/>  </switch>  </par>  </body></smil>

The SMIL file is organized very much like an HTML file (note, however, that since SMIL is actually an XML application, it follows XML conventions, which require both element and attribute names to be written in lowercase letters). The SMIL file begins by opening a <smil> element and ends by closing it (</smil>). The document includes <head> and <body> elements, just as HTML documents do. Like many HTML documents, the <head> element includes a <meta> element, which contains information about the document itself. The unfamiliar element is called <layout>. With this element, the code above defines two "regions," one for the video itself and one for the captions.

The <body> element includes a <par> element. Nested inside the <par> element are the elements to be played in parallel that is, the video stream, the caption track, and the audio description track.

The first element is the video stream. This <video> element has two attributes: src, identifying the actual filename, and system-captions, telling the player whether to turn the closed captions on or off.

Following the <video> element is another element, <switch>; this element includes the <textstream> element that identifies the captions and the <audio> element for the audio description file. Each of these elements has src and system-captions attributes. The <textstream> element also has a region attribute, which tells the player to display the captions in the text region defined in the <layout> element. It also has a title attribute (this is optional, but can be helpful for search engines and other purposes) and the alt attribute required for nontext elements.

When to Use Audio Description

Some videos may not require audio description. For example, if you're in the fortunate position of creating or commissioning the video specifically for your current project, it may be possible to script it in such a way that it doesn't need to be described. The ATSTAR project includes a video enactment of the process of assessing the AT needs of a student who has been having trouble performing a specific classroom task. In the video, teachers and counselors meet to discuss the situation and try to find a solution.

Is an audio description necessary? Not in this case, because the "action" is almost entirely dialog-based. The video was purposely scripted so that the characters announce themselves and set the scene in their dialog. Audio descriptions would be redundant here because ATSTAR users who do not see the action will receive all the essential information from the soundtrack while closed captions deliver the content to users who do not hear the dialog. A transcript of the exchange provides text that can be accessed by people using Braille displays and others who don't have video players installed.

In most cases, though, you'll be working with existing video material, and you'll have to determine whether it requires description or not. The decision depends on the information being conveyed. For example, if the video content is more action-oriented, such as a car race or a sporting event, or if the meaning of the scene depends partly on the features of the setting where it takes place, then the soundtrack alone will not convey enough information to provide an equivalent alternative to what is shown on the screen. Or perhaps a dynamic graph is being generated while members of a group discuss employment trends. In this case, audio description is absolutely essential, and a complete text transcript would include both the captions and the audio description.


    Maximum Accessibility(c) Making Your Web Site More Usable for Everyone
    Maximum Accessibility: Making Your Web Site More Usable for Everyone: Making Your Web Site More Usable for Everyone
    ISBN: 0201774224
    EAN: 2147483647
    Year: 2002
    Pages: 128

    Similar book on Amazon

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net