12.3 Video Formats | ITV Handbook: Technologies and Standards

12.3.1 MPEG

The MPEG file format is complex, and has multiple parts that are partly described in previous chapters and fully specified by the MPEG specification; this section is a overview that does not go into specific technical details. Essentially , an MPEG file is a bounded MPEG-compliant bitstream that is a sequence of MPEG packets.

The file format has two forms: the transport stream and the program stream. Each is optimized for a different set of applications. Both the transport stream and program stream enable synchronizing the decoding and presentation of the video and audio information, and ensure that data buffers in the decoders do not overflow or underflow. Information is coded using time stamps concerning the decoding and presentation of coded audio and visual data and time stamps concerning the delivery of the data stream itself. Both bit stream definitions are packet-oriented multiplexes.

The basic encoding and multiplexing approach for single video and audio elementary streams is illustrated in Figure 12.3. The video and audio data is encoded either in MPEG-2 for mats (i.e., ISO/IEC 13818-2,3) or in MPEG-4 format (i.e., ISO/IEC 14496) [MPEG4] to produce compressed elementary bit-streams. The resulting bit streams are packetized to produce PES packets. Information needed to use PES packets independent of either transport streams or program streams may be added when PES packets are formed . This information is not needed and need not be added when PES packets are assembled into transport streams or program streams. The MPEG systems standard covers those processes to the right of the vertical dashed line.

Figure 12.3. Simplified overview of MPEG system coding, ISO/IEC 13818 part 1.

Program and transport streams are designed for different applications and their definitions do not strictly follow a layered model. It is possible and reasonable to convert from one to the other; however, one is not a subset or superset of the other. In particular, extracting the contents of a program from a transport stream and creating a valid program stream is possible and is accomplished through the common interchange format of PES packets, but not all of the fields needed in a program stream are contained within the transport stream; some are derived. The transport stream may be used to span a range of layers in a layered model, and is designed for efficiency and ease of implementation in high bandwidth applications.

12.3.1.1 Transport Stream

The transport stream is designed for environments in which significant errors may occur resulting in bit value errors or loss of packets (e.g., iTV broadcast). The transport stream combines one or more programs with one or more independent time bases into a single stream. PES packets made up of elementary streams that form a program share a common time-base. The transport stream is designed for use in environments where errors are likely, such as storage or transmission in lossy or noisy media. transport stream packets are 188 bytes in length. It is possible to construct transport streams containing one or more programs from elementary coded data streams, from program streams, or from other transport streams that may contain one or more programs.

The transport stream is designed to perform the following operations with minimal effort:

Extract the coded elementary stream data from one program within the transport stream, decode it and present the decoded results; this operation is called decoding.
Extract the transport stream packets of one or more programs from one or more transport streams and produce as output a different transport stream; this operation is called re-multiplexing.
Extract the contents of one program from the transport stream and produce as output a program stream containing that one program; this operation is called de-multiplexing.
Take a program stream, convert it into a transport stream to carry it over a lossy environment, and then recover a valid, and in certain cases, identical program stream.
Cut the file at random location and still be able to perform all these operations.

Operations performed by the transport stream decoder either apply to the entire transport stream (multiplex-wide operations), or to individual elementary streams (stream-specific operations). The MPEG-2 transport system layer is divided into two sub-layers, one for multiplex -wide operations (the transport packet layer), and one for stream-specific operations (the PES packet layer) [MPEG2].

One possible architecture of an audio/video transport stream decoder is depicted in Figure 12.4 to illustrate the function of a decoder. transport stream system decoder functions, such as decoder timing control, can also be distributed among elementary stream decoders. Likewise, errors detected by audio and video decoders may be propagated in various ways.

Figure 12.4. Simplified transport stream decoding example.

It is possible and reasonable to convert between different types and configurations of ISO/IEC 13818 streams; such operations are typically referred to as re-multiplexing, which essentially performs de-multiplexing followed by multiplexing. As an example, re-multiplexing can convert a transport stream containing multiple programs into a transport stream containing a single program. Similarly, conversion from a transport stream into a program stream is performed by de-multiplexing, extraction of the PES packets and their subsequent encapsulation within a program stream. There are specific fields defined in the transport stream and program stream syntaxes that need to be manipulated to achieve such conversions: for example, the remultiplexing operation includes the processing of Program Clock Reference (PCR) time-stamps to account for the transformation.

12.3.1.2 Program Stream

The program stream, commonly used in systems that control the flow of data (e.g., DVD and CD players), results from combining one or more streams of PES packets, which have a common time base, into a single stream. The program stream can also be used to encode multiple audio and video elementary streams into multiple program streams, all of which have a common time base. Like the single program stream, all elementary streams can be decoded with synchronization.

Program streams are constructed in two layers: a system layer and a compression layer. The input stream to the program stream Decoder has a system layer wrapped about a compression layer. Input streams to the Video and Audio decoders have only the compression layer. Operations performed by the program stream Decoder either apply to the entire program stream (multiplex-wide operations), or to individual elementary streams (stream-specific operations), namely the PES packet layer.

The program stream is designed for use in relatively error-free environments (e.g., DVD and/or CD players) and is suitable for applications that may involve software processing of system information such as interactive multi-media applications (e.g., arcade games ). program stream packets may be of variable and relatively great length.

A typical program stream decoder shown in Figure 12.5 is composed of system, video, and audio decoders. The multiplexed coded representation of one or more audio and/or video streams is assumed to be stored on a DSM, or network, in some medium-specific format. As for transport streams, system decoder functions including decoder timing control might equally well be distributed among elementary stream decoders and the media-specific decoder.

Figure 12.5. Example program stream decoder.

The depicted decoder accepts as input an ISO/IEC 13818 program stream and relies on a program stream decoder to extract timing information from the stream. The program stream decoder demultiplexes the stream, and the elementary streams so produced serve as inputs to Video and Audio decoders, whose outputs are decoded video and audio signals. Included in the design, but not shown in the figure, is the flow of timing information among the program stream decoder, the video and audio decoders, and the medium specific decoder; this timing information (i.e., PTS and DTS) is used to synchronize the video and audio decoders.

12.3.1.3 Packetized Elementary Stream (PES)

Transport and program streams are each logically constructed from PES packets. PES packets are used to convert between transport streams and program streams; in some cases the PES packets need not be modified when performing such conversions. PES packets may be much larger than the size of a transport stream packet.

A continuous sequence of PES packets of one elementary stream with one stream ID may be used to construct a PES Stream. When PES packets are used to form a PES stream, they include Elementary Stream Clock Reference (ESCR) fields and Elementary Stream Rate (ES_Rate) fields. Time stamps encoded in PES packet headers apply to presentation times of compression layer constructs. However, PES packet payloads need not start at compression layer start codes: for example, a video packet may start at any byte in the video stream.

PES streams do not contain some necessary system information that is contained in program streams and transport streams. Examples include the information in the Pack Header, System Header, program stream Map, program stream Directory, Program Map Table, and elements of the transport stream packet syntax.

12.3.1.4 Conversion between Transport Stream and Program Stream

It is possible and reasonable to convert between transport streams and program streams by means of PES packets. This results from the specification of transport stream and program stream. PES packets may, with some constraints, be mapped directly from the payload of one multiplexed bit stream into the payload of another multiplexed bit stream. It is possible to identify the correct order of PES packets in a program to assist with this, but buffer model specifications are different between the two formats. Most information needed for conversion (e.g., the relationship between elementary streams) is available in tables and headers in both streams.

12.3.1.5 Video Format

An MPEG video file contains a video elementary stream encapsulated as PES packets, each typically carried in more than 300 transport stream packets.The structure of PES packets is complex and requires multilevel decoding.

The PES packet contains a sequence of still pictures or frames which, without compression, would require a data rate far too high. To reduce the size of the digital picture by half, sampling of alternate lines was introduced; each such sample is called a field. Because only one field from every frame is used, these sampled fields form a progressively scanned video sequence. In MPEG-2, the term picture refers to either a frame or a field. Therefore, a coded representation of a picture may be reconstructed to a frame or a field.

To achieve additional compression, each field is divided into an array of macroblocks, each 16 x 16 pixels in size, where each pixel comprises 4 blocks of Y (luminance) and 1 block each of U and V (color) information. The color information therefore has half the horizontal and vertical resolution of the luminance information. The Y, U and V information in each macro-block is compressed using DCT encoding and Motion Compensation.

In frame pictures, each macroblock can be predicted (using motion compensation) on a frame or field basis. The frame-based prediction uses one motion vector per direction (forward or backward) to describe the motion relative to the reference frame. In contrast, field-based prediction uses two motion vectors, one from an even field and the other from an odd field. Therefore, there can be up to four vectors (two per direction, and forward and backward directions) per macroblock. In field pictures, the prediction is always field-based, but the prediction may be relative to either an even or odd reference field. This compression scheme gives rise to three types of frames: I-frames, P-frames and B-frames.

Some standards (e.g., DVB MHP) require that MPEG file decoders are capable of fully decoding MPEG-2 I-Frames, but are not always expected to decode P-frames and B-frames. MHP also requires support for an MPEG-2 video drip feed mode that only requires handling I-frames and P-frames but not B-frames.

12.3.1.6 Time Model

ISO/IEC 13818 Systems, Video and Audio all have a timing model in which the end-to-end delay from the signal input to an encoder to the signal output from a decoder is a constant. This delay is the sum of encoding, encoder buffering, multiplexing, communication or storage, demultiplexing , decoder buffering, decoding, and presentation. As part of this timing model all video pictures and audio samples are presented exactly once, unless specifically coded to the contrary, and the inter-picture interval and audio sample rate are the same at the decoder as at the encoder. The system stream coding contains timing information that can be used to implement systems that embody constant end-to-end delay. It is possible to implement decoders that do not follow this model exactly; however, in such cases it is the decoder's responsibility to perform in an acceptable manner.

Headers within program packets are designed to facilitate multiplex-wide operations. Packet headers specify intended times at which each byte is to enter the program stream decoder from the data source, and this target arrival schedule serves as a reference for clock correction and buffer management.

Similarly, transport streams are composed of transport stream packets with headers containing information which specifies the PCR times. However, the specific PTS and DTS are carried in PES packets that are common to both Transport and program streams.

All timing is defined in terms of a common STC. In the program stream this clock may have an exactly specified ratio to the video or audio sample clocks, or it may have an operating frequency that differs slightly from the exact ratio while still providing precise end to end timing and clock recovery.

In the transport stream the system clock frequency is constrained to have the exactly specified ratio to the audio and video sample clocks at all times; the effect of this constraint is to simplify sample rate recovery in decoders.

12.3.1.7 Synchronization of Elementary Streams

Synchronization among multiple elementary streams is effected with PTS in the Program and Transport bit streams. Time stamps are generally in units of 90kHz, but the SCR, the PCR and the optional ESCR have extensions with a resolution of 27MHz. Decoding of N elementary streams is synchronized by adjusting the decoding of streams to a common master time base rather than by adjusting the decoding of one stream to match that of another. The master time base may be one of the N decoders' clocks, the DSM or channel clock, or it may be some external clock.

Because PTS apply to the decoding of individual elementary streams, they reside in the PES packet layer of both the transport streams and program streams. End-to-end synchronization occurs when encoders save time-stamps at capture time, when the time stamps propagate with associated coded data to decoders, and when decoders use those time-stamps to schedule presentations.

Synchronization of a decoding system with a data source is achieved through the use of the SCR in the program stream and by the equivalent PCR in the transport stream. The SCR and PCR are time stamps encoding the timing of the bit stream itself in terms of the same time base as is used for the audio and video PTS values from the same program.

Although program has one PCR time base associated with it, it is possible that multiple programs may share a common set of PCRs. Typically, for transport streams that contain multiple programs, each program has its own time base. The time bases of different programs within such a stream may be different. Because each program may have its own time base, there are separate PCR fields for each program in a transport stream containing multiple programs. It is possible to have only one PCR for some or all programs in a multi-program transport.

12.3.1.8 System Reference Decoder

Part 1 of ISO/IEC 13818 employs an STD, one for transport streams referred to as Transport STD (T-STD) and one for program streams referred to as Program STD (P-STD), to provide a formalism for timing and buffering relationships. Because the STD is parameterized in terms of ISO/IEC 13818 fields (e.g., example, buffer sizes) each ISO/IEC 13818 stream leads to its own parameterization of the STD. It is up to encoders to ensure that bit streams they produce play in normal speed on corresponding STDs. Physical decoders may assume that a stream plays properly on its STD; the physical decoder compensates for ways in which its design differs from that of the STD.

12.3.2 QuickTime MOV

A QuickTime file stores the description of the media separately from the media data [QT]. The description, or metadata, is called the movie, and it contains information such as the number of tracks, video compression format, timing information and index of the media data. The media data may be stored in the same file as the QuickTime movie, in a separate file, or in several files.

12.3.2.1 Atoms

The basic data unit in a QuickTime file is the atom; there are simple atoms and QT atoms. Each atom contains size and type information along with its data. The size field indicates the number of bytes in the atom, including the size and type fields. The type field specifies the type of data stored in the atom and, by implication , the format of that data. Both the size and type fields are 32-bit integers.

Atoms are recursive in nature, namely, one atom may contain one or more other atoms of varying type. For example, a movie atom contains one track atom for each track in the movie. The track atoms, in turn , contain one or more media atoms each, along with other atoms that define other track and movie characteristics. This hierarchical structure of atoms is referred to as a containment hierarchy.

Figure 12.6 shows the layout of a sample QuickTime atom. Each atom carries its own size and type information as well as its data. Atoms within container atoms do not have to be in any particular order, with the exception of handler description atoms. Handler description atoms come before their data. For example, a media handler description atom comes before a media information atom, and a data handler description atom comes before a data information atom.

Figure 12.6. Atom Structure.

Atoms consist of a header, followed by atom data. An atom header consists of an atom size and a type. The atom size is a 32-bit integer that indicates the size of the atom, including both the atom header and the atom's contents. If the atom is a leaf atom, then this field contains the size of the single atom. The size of a container atom includes all of its contained atoms. The atom type is given by a 32-bit integer.

The format of the data stored within a given atom cannot be determined based only on the type field of that atom. That is, an atom's use is determined by its context. A given atom type may have different usages when stored within atoms of different types. This means that all QuickTime file readers take into consideration not only the atom type, but the atom's containment hierarchy.

12.3.2.2 QT Atoms

QT atoms provide a more general purpose storage format and remove some of the ambiguities that arise when using simple atoms. In particular, with simple atoms there is no way to know if an atom is a leaf node or whether it contains other atoms, or both, without specific knowledge about the atom. Using QT atoms, a given node is either a leaf node or a container node. There is no ambiguity. Furthermore, QT atoms allow for multiple atoms of a given type to be specified through identification numbers . Although QT atoms are a more powerful data structure, they require more overhead in the file.

The QuickTime file format uses both atoms and QT atoms. In general, newer parts of the QuickTime file format use QT atoms, and older parts use simple atoms. When defining new QuickTime structures, QT atoms should be used whenever practical.

Figure 12.7 depicts the layout of a QT atom. Each QT atom starts with a QT atom container header, followed by the root atom. The root atom's type is determined by the QT atom's type. The root atom contains any other atoms that are part of the structure. Each container atom starts with a QT atom header followed by the atom's contents. The contents are either child atoms or data, but never both. If an atom contains children it also contains all of its children's data and their descendants. The root atom is always present and never has any siblings.

Figure 12.7. QT Atoms.

12.3.2.3 QuickTime File Format

A QuickTime file is simply a collection of atoms. QuickTime does not impose any rules about the order of these atoms. A few of these types are considered basic atom types and form the structure within which the other atoms are stored (see Table 12.27).

Table 12.27. Atom Types

Type	Description
`free`	Both free and skip atoms designate unused space in the movie data file. These atoms consist an atom header (atom size and type fields), followed by bytes of free space. When reading a QuickTime movie, an application may safely skip these atoms. When writing or updating a movie, the space associated with these atom types may be reused.
`skip`
`mdat`	This is the movie data atom that usually can only be interpreted by using the movie resource. It consists of an atom header (atom size and type fields), followed by the movie's media data.
`pnot`	This is a preview atom. It contains information that allows finding the preview image associated with a QuickTime movie. The preview image, or poster, is a representative image suitable for display to the user in, say, file- open dialogs.
`moov`	This is a movie atom. It is a container for the information that describes a movie's data. This information, or metadata, is stored in a number of different types of atoms. As such, the movie atom is essentially a container of other atoms. At the highest level, movie atoms contain track atoms, which in turn contain media atoms. At the lowest level are the leaf atoms, which contain the actual data, usually in the form of a table or a data stream.

QuickTime file names typically have an extension of ".mov". On the Macintosh platform, QuickTime files have a file type of moov . On the Macintosh, the movie atom may be stored as a Macintosh resource using the Resource Manager. The resource has a type of moov . All media data is stored in the data fork.

Although QuickTime imposes no strict order on a movie's atoms, it is often convenient if the movie atom appears near the front of the file. For example, an application that plays a movie over a network would not necessarily have access to the entire movie at all times. If the movie atom is stored at the beginning of the file, the application can use the metadata to understand the movie's content as it is acquired over the network.

12.3.2.4 Movie Atom

Movie atoms, of type moov , specify the content of the movie. It contains other types of atoms, including one leaf atom, the movie header mvhd , and several container atoms (see Figure 12.8): a clipping atom ( clip ), one or more track atoms ( trak ), a color table atom ( ctab ), and user data ( udta ).

Figure 12.8. A sample one-track video movie.

12.3.2.5 Sample Table

QuickTime stores media data in samples. A sample is a single element in a sequence of time-ordered data. Samples are stored in the media, and they may have varying durations. Samples are stored in a series of chunks in a media. Each of these chunks is a collection of data samples in a media that allow optimized data access. A chunk may contain one or more samples. Chunks may contain a varying number of samples, each having different size (see Figure 12.9).

Figure 12.9. Organization of Sample Data Chunks.

The sample table atom acts as a storehouse of information about the samples and contains a number of different types of atoms. The various atoms contain information that allows the media handler to parse the samples in the proper order.

The sample table has an atom type of stbl . It contains the sample description atom, the time-to-sample atom, the sample-to-chunk atom, the sync sample atom, the sample size atom, the chunk offset atom, and the shadow sync atom.

When QuickTime displays a movie or track, it tells the appropriate media handler to access the media data for a particular time. The media handler correctly interprets the data stream to retrieve the requested data. In the case of video media, the media handler traverses several atoms to find the location and size of a sample for a given media time. The media handler performs the following:

Determines the time in the media time coordinate system.
Examines the time-to-sample atom to determine the sample number that contains the data for the specified time.
Scans the sample-to-chunk atom to discover which chunk contains the sample in question.
Extracts the offset to the chunk from the chunk offset atom.
Finds the offset within the chunk and the sample's size by using the sample size atom.

Finding a frame (or a bookmark) for a specified time in a movie is slightly more complicated than finding a sample for a specified time. The media handler uses the sync sample atom and the time-to-sample atom together to find a key frame as follows :

Examines the time-to-sample atom to determine the sample number that contains the data for the specified time.
Scans the sync sample atom to find the key frame that precedes the sample selected in step 1.
Scans the sample-to-chunk atom to discover which chunk contains the key frame.
Extracts the offset to the chunk from the chunk offset atom.
Finds the offset within the chunk and the sample's size by using the sample size atom.

12.3.2.6 Media Atoms

Media atoms define a track's movie data. The media atom specifies the media handler component that is to interpret the media data; it also specifies data references. It has an atom type of 'mdia'. It may contain other atoms, such as a media header ( mdhd ), a handler reference ( hdlr ), media information ( minf ), and user data ( udta ). The only required atom in a media atom is the media header atom. A time code media atom is used to store time code data in QuickTime movies. The time code sample description contains information that defines how to interpret time code media data. The time code could be displayed, and therefore, a media information atom is used to control that display.

12.3.2.7 User Data Atoms

User data atoms, of type udta , allow defining and storing of arbitrary data associated with a QuickTime object, such as a movie, track, or media. This atom provides a simple way to extend what is stored in a QuickTime movie. The list of user-data entry types is given in Table 12.24. All user data list entries whose type begins with the '@' character (ASCII 169) are international text. These list entries contain a list of text strings with associated language codes, allowing each user data text item to have translations for different languages.

Table 12.28. The List of User Data Entry Types

Type	Description
`@cpy`	Copyright statement.
`@day`	Date the movie content was created.
`@dir`	Name of movie's director.
`@ed1-9`	Edit dates and descriptions.
`@fmt`	Indication of movie format (computer-generated, digitized, and so on).
`@inf`	Information about the movie.
`@prd`	Name of movie's producer.
`@prf`	Names of performers.
`@req`	Special hardware and software requirements.
`@src`	Credits for those who provided movie source content.
`@ wrt`	Name of movie's writer
`WLOC`	Default window location for movie. Two 16 bit values, {x,y}.
`name`	Name of object.
`LOOP`	Long integer indicating looping style: 0 for none, 1 for looping, 2 for palindrome looping.
`SelO`	Play selection only. Byte indicating that only the selected area of the movie be played .
`AllF`	Play all frames byte indicating that all frames should be played, regardless of timing.

12.3.2.8 QuickTime FourCC Media Types

QuickTime uses atoms of different types to store different types of media data: video media for video data, sound media for audio data, and so on. Table 12.29 lists supported image compression formats.

Table 12.29. FourCC codes for Content Types Supported by QuickTime

Compression type	Description
`cvid'`	Cinepak
`jpeg`	JPEG
`raw`	Uncompressed RGB
`YUV2`	Uncompressed YUV422
`smc`	Graphics
`rle`	Animation
`rpza`	Apple Video
`kpcd`	Kodak Photo CD
`qdgx`	QuickDraw GX
`mpeg`	MPEG still image
`mjpa`	Motion-JPEG (Format A)
`mjpb`	Motion-JPEG (Format B)

12.3.3 AVI

An Audio Video Interleaved (AVI) file format is a RIFF whose RIFF ID is " AVI " (notice the space) [AVI]. It contains hdrl lists (rather than chunks) which contains avih and strl chunks. The audio and video chunks in an AVI file do not contain time stamps or frame counts. The data is ordered in time sequentially as it appears in the AVI file. A player application should display the video frames at the frame rate indicated in the headers. The application should play the audio at the audio sample rate indicated in the headers. Usually, the streams are all assumed to start at time zero because there are no explicit time stamps in the AVI file.

An AVI file can contain zero or one video stream and zero, one, or many audio streams. For an AVI file with one video and one audio stream contains the video-stream list of a single strl list containing: a video stream header chunk strh , a video stream format chunk 'strf', an a list chunk 'strl' pointing to the audio chunk.

In principle, a video chunk contains a single frame of video. By design, the video chunk should be interleaved with an audio chunk containing the audio associated with that video frame. The data consists of pairs of video and audio chunks. These pairs may be encapsulated in a ' REC ' list. Not all AVI files obey this simple scheme: Unfortunately, one can find AVI files with all the video followed by all of the audio.

The sound data is typically 8 or 16 bit PCM, stereo or mono, sampled at 11, 22, or 44.1 KHz. Traditionally, the sound has typically been uncompressed Windows PCM. With the advent of the Internet and the severe bandwidth limitations of the Internet, there has been increasing use of audio codecs. The wFormatTag field in the audio 'strf' (Stream Format) chunk identifies the audio format and codec.

12.3.3.1 Open Digital Media (OpenDML) AVI File Format Extensions

The lack of time stamps is a weakness of the original AVI file format. The Open Digital Media (OpenDML) AVI Extensions add new chunks with time stamps. Its successor format, Microsoft's Advanced (formerly Active) Streaming Format (ASF) is MPEG-based and thus has all necessary synchronization capabilities needed; it is intended to replace the AVI format. The OpenDML AVI File Format Extensions extend AVI to support a variety of features required for for the motion JPEG AVI files used for professional video authoring, editing and production. These include support for fields (as opposed to frames only), file sizes larger than 1 GB, time-codes, new lists and chunk types, as well as many other features.

OpenDML appears to have been spearheaded by Matrox to improve AVI for professional video authoring and editing. On October 2, 1997, the OpenDML AVI File Format Extensions Version 1.02 specification document (dated February 28, 1996) was available from Matrox Electronic Systems, Ltd. The OpenDML effort seems to have been pushed to one side with the advent of ActiveMovie, NetShow, ASF files, and other Microsoft initiatives.

12.3.3.2 Image Formats

There are various formats for representing images within an AVI file. Microsoft Windows represents bitmapped images internally and in files as DDB, Device Independent Bitmaps (DIB), and DIB Sections. Prior to Windows 3.0, Windows relied on Device Dependent Bitmaps for bitmapped images.

Names of pixel layouts supported by AVI decoders are abbreviated using the FourCC defined by Microsoft as part of Video for Windows (see Table 12.26). The code 0x00000000 specifies the hexadecimal value of four characters each of which has a value of 0. A Four Character Code AAAA has hexadecimal value 0x41414141 where 0x41 is the ASCII code for the character 'A'. AVI files contain the FourCC for the video compressor in the video stream header. For example, the FourCC CVID identifies the Cinepak (formerly Compact Video) video compressor. Codes such as YUV2 identify layouts of pixels in YUV space (as opposed to RGB). These codes are used in interfacing with graphics cards. For example, the S3 ViRGE/VX chip supports the YUV2 pixel layout. YUV2 is popular because it refers to the 4:2:2 format used in CCIR-601 (D1) digital video. Today, many video capture and editing products use the non-standard FourCC 0x00000000 for uncompressed AVI video.

Table 12.30. Uncompressed Video Capture Formats

Code	Description
`MRLE`	Microsoft Run Length Encoding
`IV31`	Indeo 3.1/3.2
`IV32`	Indeo 3.1/3.2
`CVID`	Cinepak (Radius)
`ULTI`	Ultimotion (IBM)
`MJPG`	Motion JPEG (Microsoft Paradigm Matrix video capture companies)
`IJPG`	Intergraph JPEG
`CYUV`	Creative YUV
`YVU9`	Intel Indeo Raw YUV9
`XMPG`	Editable (I frames only) MPEG (Xing)
`MPGI`	Editable MPEG (Sigma Designs)
`VIXL`	miro Video XL
`MVI1`	Motion Pixels
`SPIG`	Radius Spigot
`PGVV`	Radius Video Vision
`TMOT`	Duck TrueMotion S
`DMB1`	Custom format used by Matrox Rainbow Runner related to Motion JPEG
`IV41`	Indeo Interactive (Indeo 4.1 from Intel)
`IV50`	Indeo 5.x including 5.0, 5.06, and 5.10
`UCOD`	ClearVideo (Iterated Systems)
`VDOW`	VDOWave (VDONet)
`SFMC`	Surface Fitting Method (CrystalNet)
`QPEG`	Q-Team Dr. Knabe's QPEG video compressor
`H261`	H.261
`M261`	Microsoft H.261
`VIVO`	Vivo H.263
`M263`	Microsoft H.263
`I263`	Intel "I.263" H.263
`MPG4`	Microsoft MPEG-4

12.3.4 Motion JPEG

Motion JPEG (M-JPEG), FourCC code MJPG, is a variant of the ISO JPEG specification for use with digital video streams. Most PC video capture and editing systems capture video to AVI files using Motion JPEG video compression. Instead of compressing an entire image into a single bitstream, M-JPEG compresses each video field separately, and assembling the resulting JPEG bitstreams consecutively into a single frame. No frame differencing or motion estimation is used to compress the images. This makes frame accurate editing without any loss of image quality during the editing possible. Usually, once the video has been edited, it is compressed using Cinepak or another codec for distribution. The two flavors of M-JPEG currently in use are format A, which supports markers, and format B, which does not.

The situation with Motion JPEG standards is complicated since at one time there was no industry standard for Motion JPEG. Microsoft has a Microsoft Motion JPEG Codec and a JPEG DIB Format and the OpenDML AVI File Format Extensions.