12.1 Graphics


12.1.1 GIF

The Graphics Interchange Format (GIF) is a common format for achieving low-cost basic animation [GIF]. Its features include transparency, compression, interlacing, and support from 2 and up to 256 color palettes. Content authors need not rely on a return channel for achieving the animation. The animation is repeatable and reusable. With a single download, the same animation can be placed on a page multiple times. The animations compact, and are usually a few kilobytes long.

With all their advantages, GIFs have limitations. Photographs, which utilize a large number of colors, are better compressed by JPEG. Animations play either once or continuously, with no option in between. Some browsers do not fully support GIF89a. They may ignore times delay or user input, and may not support previous images for image removal controls. Plaintext blocks are often not supported either. The animation can be slowed down or interrupted by other images being downloaded and other playing animations. Finally, GIF does not support DRM.

The GIF file format is composed of blocks and extensions. Blocks can be classified into control, graphic-rendering and special purpose control blocks. The latter include the header, the logical screen descriptor, the graphic control extension and the trailer, control how the graphic data is handled. Graphic-rendering blocks, such as the image descriptor and the plain text extension, contain data used to render a graphic. The logical screen descriptor and the global color table affect all the images in a single file. Each control block only affects a single Image block that immediately follows it. Many special purpose blocks such as the comment extension and the application extension are rarely used by GIF decoders.

Table 12.1. The GIF89a File Format

Block Name

Description

Header block

"GIF87a" or "GIF89a"

Logical Screen Description block

This block may include an optional GLOBAL COLOR TABLE

optional Extension block

This block was typically used by Netscape.

Image Graphics

Control blocks One such block precedes each image is present.

Image Descriptors

One such descriptor for each image is present. This may include an optional Local Color Table and Image Data Table.

Trailer

At the end of a series of images.

12.1.1.1 Header

The header block is a small 6-byte block containing the version of the file format (either GIF87a or GIF89a). GIF image decoders use this information to sniff and determine the version of the file format.

12.1.1.2 Logical Screen Descriptor (LSD)

The LSD is the second block in a file. It defines the screen pixel array whose dimensions define the size of the image, which is sufficient to derive the aspect ratio of the image. This information is used by Web-browsers to compute the layout of the image within a Web-page. The logical screen area should be sufficient to display all the individual frames . If an image in the GIF file is larger than the logical screen or, by its positioning, extends beyond the screen, the portion that is off-screen is not displayed.

The LSD selects one of the colors in the Global Color Table (GCT) to be the Background color of the screen; this color selection is ignored by some browsers. GIF transparency was a side-effect introduced as a result of a bug in Netscape, which set the background of a GIF to be transparent whenever the GCT had transparency turned on any color. This bug allowed background GIFs to fill in the logical screen background regardless of how small the image was.

Every color displayed in a GIF file comes from a palette. The GCT contains a global palette of common colors for all the images in its file to work from. This palette can use 1 to 8 bits specifying 2, 4, 8, 16, 32, 64, 128, or 256 colors. That global palette is applied to all images in a GIF file. Nevertheless, individual images may have a local palette that affects its color only. However, because each image references one palette, at most 256 colors per image are possible.

12.1.1.3 Application Extensions

Application extensions allow for blocks of data to be inserted in the GIF for specific programs to act on. Application extensions can be used for arbitrary purposes and are typically used as proprietary extensions of the GIF format. As an example, proprietary decoders might use that block to specify special effects or instructions on how to handle the image data.

Netscape Navigator v2.0 uses an Application Extension Block that tells Navigator to loop the entire GIF file. The Netscape extension block appears immediately following the GCT of the LSD. The 19 bytes long block is described in Table below.

Table 12.2. Netscape's Application Extension Block

Field Name

No. of Bits

33 (hex 0x21) GIF Extension code

extensionCode

8

For Netscape, the value of 33 (i.e., hex 0x21) was used.

extensionLabel

8

For Netscape, 255 (i.e., hex 0xFF) was used.

applicationNameLength

8

For Netscape, 11 (hex (0x0B) was used.

applicationName

*

For Netscape, bytes 4 to 14 specify "NETSCAPE2.0"

dataLength

8

The length of the data Sub-Block was typically set to 3.

reserved

8

This field is set to 1 (hex 0x01).

loopIterations

16

A 16-bit unsigned integer (in lo-hi byte format) indicating the number of iterations the loop should be executed

terminator

8

This field is set to 0 (hex 0x00).

12.1.1.4 Control Blocks

A control block specifies the optional display aspects of the image immediately following it. It specifies whether the image has a local palette, whether one of the colors in this image is transparent, whether a delay should be used before the display of the image, and disposal options specifying what should be done with the image after it has been displayed.

When a delay requirement is specified, user input and timing may be used. If a timed delay is set, the browser should wait for 0.01 seconds before removal and proceeding to the next image. If a user input is provided, the browser should wait until the user strikes a key or clicks a mouse. What is considered user input is determined by the program displaying the GIF. If both options are elected, the browser should allow the image to remain displayed until the delay expires or a user input occurs, whichever comes first.

Before preceding to the next image, the removal options are performed. These options include doing nothing, leaving the image onscreen as is, replacing it with a previous image, and replacing the image with the logical screen background color.

12.1.1.5 Image Block

An image block contains the image data. It is possible to mix images having a different numbers of color and different color palette in a single GIF. The image block contains the image size in pixels, the position of the image on the logical screen, an indication whether it is interlaced, and an optional local palette for this image. The image's size in pixels must be consistent with the image data or unpredictable behavior may occur.

The top and left position of the images are within the defined logical screen. A small bitmap of an object could be placed anywhere within the screen area, rather than create an entire image with the object positioned within it.

Interlacing is a way of saving and displaying the image data. Interlacing stores the rows of the image in the order shown in Table 12.3:

Table 12.3. Interlacing Example Table

Pass

Description

Pass 1

Every 8th row, starting with row 0.

Pass 2

Every 8th row, starting with row 4.

Pass 3

Every 4th row, starting with row 2.

Pass 4

Every 2nd row, starting with row 1.

12.1.1.6 Comment Block

The GIF file may contain comments. These can be markers for long animation sequences or statements of ownership. These comments do not appear on the screen.

12.1.1.7 Text Blocks

In addition to images, GIF files may be used to render text on screen. Unfortunately many programs do not recognize the text blocks.

12.1.1.8 Trailer

The Trailer is a single byte, 0x3B, indicating the end of the GIF file.

12.1.2 JPEG

The Joint Photographic Experts Group (JPEG) is a standardized image compression mechanism [JPEG]. It enables bitstreams representing photographic information to be exchanged across a wide variety of platforms and applications. JPEG features compression, standard color space for one or three components (YCbCr), and markers to specify extensions. It is designed for compressing either full-color or grayscale images of natural, real-world scenes. It exploits known limitations of the human eye, most notably the fact that small color changes are perceived less accurately than small changes in brightness. Thus, JPEG is intended for compressing images that are viewed by humans and may not be appropriate for machine analysis.

JPEG compression is lossy , meaning that the decompressed image is not identical to the source image (JPEG achieves much greater compression than possible with lossless methods ). The degree of data loss can be controlled by adjusting compression parameters. Further, decoders can trade off decoding speed against image quality, by using fast but inaccurate approximations to the required calculations; it is possible to achieve remarkable performance in this way.

The JPEG File Interchange Format (JFIF) was designed to support streaming data scenarios, as opposed to situations in which the data is stored in a file on the local disk. This support is achieved through the progressive and hierarchical modes described later in this section.

12.1.2.1 Comparing JPEG and GIF

In general, JPEG is superior to GIF for storing full-color or grayscale images of realistic scenes; that means scanned photographs, continuous-tone artwork, and similar material. Any smooth variation in color, such as that which occurs in highlighted or shaded areas, are represented more faithfully in less space by JPEG.

In contrast, GIF does significantly better on images with only a few distinct colors, such as line drawings and simple cartoons. Not only is GIF lossless for such images, but it often compresses them more than JPEG can. For example, large areas of pixels that are all exactly the same color (e.g., image borders or frames) are compressed much better by GIF.

Computer-drawn images, such as ray-traced scenes, usually fall between photographs and cartoons in terms of complexity. The more complex and subtly rendered the image, the more likely that JPEG performs well on it. Icons that use only a few colors are handled better by GIF.

JPEG does poorly on plain black-and-white (two level) images. At least 16 gray levels are needed to allow JPEG to produce useful images. Further, JPEG does not work well with very sharp edges. A row of pure-black pixels adjacent to a row of pure-white pixels, borders and overlaid text, for example, would not look good. Sharp edges tend to come out blurred unless the compression parameters are set to high quality (and no compression). The blurriness is particularly objectionable with text that is only a few pixels high.

12.1.2.2 Huffman Coding Background

According to information theory, the entropy of a data object is the relative probability that each symbol in that data object occurs, namely E = p log 2 p. Consider data made up of all the letters of the alphabet (i.e., no numbers). To achieve some compression, instead of using 8 bits for each letter, it is possible to use a varying number of bits, starting from 1 bit. The most frequently occurring character uses the 1 bit code of 0x01 (i.e., a single bit set to 1), the second most frequent character uses a 2 bit code, and so on. Therefore, the first step in the coding is to determine the frequency that each letter occurs. This frequency could be approximated by computing the probability with which each letter occurs. Capital letters are less likely still, but an uppercase 'S' is more common than a lowercase 'z'.

The brute force method of transmission would require 6 bits per symbol, because there are 26 lowercase and 26 upper case letter, with a total of 52 symbols. Each bit pattern would have its very own bit pattern, with 12 left over. This would be wasteful , even when transmitting 64 different symbols. Since lowercase is more common than upper case, it would better to use 5 bits. 26 of the 32 entries determine which lowercase letter to use, and the remaining 6 are divided equally among the upper case letters. Immediately following one of these 6 patterns is a 3 bit code indicating which uppercase letter is being used. The transmission of a lowercase letter would require 5 bits, and the transmission of an uppercase letter would require 8 bits. Lowercase letters are many times more common than upper case letters. Therefore, the single bit that is saved each time a lowercase letter is transmitted more than makes up for the fact that 8 bits are required for an upper case letter.

As a further optimization, it is possible to send only one bit for the letter 'e' and 6 bits for all the other letters: One bit to indicate 'NOT e' and 5 bits to indicate which of the other symbols is correct. That way, 5 bits are saved every time an e is used, and a single bit is wasted every time a different letter is sent. Bits are saved if e occurs more than 5 times as often as every other letter in a message.

The end result of this is huffman coding. Each character is counted, and a variable number of bits is assigned to its representation. Huffman, Lempel-Ziv, and so on, all these compression methods work more or less like this.

One needs to consider the trade-offs between the better compression achievable with large symbol sets and complexity of that compression. For example, if the data contained is 'aeaeaeaeaeaeae', that would compress to one bit per letters, namely '01010101010101'. Grouping the data two letters at a time would compress it more efficiently as '1111111' (1 = 'ae'), however the extended set of symbols of letter-pairs contains 52 x 52 = 2700 different symbols.

The efficiency of a coding method is equal to the ratio of the entropy of the source image data to the entropy of the resulting compressed data over the average number of bits used to transmit that data object. Let N be the number of symbols in a data object. If counts all the letters in the data object and place them in an array named counts can use the following to compute the entropy E = p log 2 p:

for (i = 'a'; i < = 'z'; i++) H = H counts[i]/N * log2(counts[i]/N);

where N is the number of symbols in the message and the value of H starts from 0. That is, the entropy equals the sum of the negative logarithm (base 2) of the probability of a symbol times the probability of that symbol; in other words, p = counts[i]/N . H is the entropy and equals the minimum number of bits per symbol needed to encode the data object. The base of '2' stems from the use of 0-1 bits to represent the data.

The difference between 8-bit, 16 bit, and 32 bit encoding can be illustrated as follows. Using one character at a time, the string 'abaeabaeab' compresses to '101100101100101' (1='a', 01='b', 00='e). The entropy of that message is (0.5*log 2 (.5)-0.3*log2(.3)-0.2*log2(0.2))=1.485 bits per string. The efficiency of the encoded string is E=1.485/1.5 or 99%. On the other hand, at two characters per symbol the entropy drops to 1 bit a symbol, and 100% efficiency is possible with only 5 bits for the entire message.

12.1.2.3 Background on Lossy Compression

The entropy of a data object tells us the absolute best compression that can be done for that data. To do any better than that, there is a need to discard information. For images, the decision about which information to discard relies on certain psychophysical aspects of human vision.

The JPEG process takes two passes . The first pass is to break up the image into planes. Generally, it is a good idea to separate out the brightness information from the color information by converting the image to YUV (or LAB) but this is not required. The next step is to break that plane info 8x8 or 16x16 chunks. These chunks are converted from position and value information into frequency information. For example an alternating black and white pattern would be recorded as a 50% cycle rather than as 10101010101.... The overall color of the chunk is recorded, then the relative values of the frequency components are recorded. Then, continuous values representing frequency information are mapped to the closest number of a few available values. For example the value 5.4 would map to 5, and the value 5.6 would map to 6. Information is lost because 5.4 5 is.4 and 5.6 6 is .4. The image no longer can have a pixel wave with the value of 5.4 or 5.6, so information is lost.

Noise is always present in regenerated picture. Because the eye is relatively forgiving of high frequency noise, the encoded image may not look any different to eye. Typically, reproduction processes (e.g., printing presses, NTSC broadcast) may have much larger errors.

The broadcast standard for NTSC allocates about 4.2 MHz for the luminance Y but allocates only about 1 MHz for the color I&Q channels, namely less than a quarter of the bandwidth. This implies that, on the average, there is only enough color information for every fourth pixel. When JPEG loses information in frequencies above the 50% duty cycle, the resulting image loss is not detectable by the end receiver. Further, the I&Q channels are only allowed to be within a certain range of values when broadcast. Any loss of signal beyond that range due to JPEG is no different than that caused by the transmission stage.

12.1.2.4 Background on DCT in JPEG

The Discrete Cosine Transform (DCT) is a transformation of an image with high visual fidelity to the encoder's source image. The simplest DCT-based coding process is referred to as the baseline sequential process, which is sufficient for many applications. There are additional DCT-based processes that extend the baseline sequential process to a broader range of applications. Figures 12.1 and 12.2 illustrate a simplification of the special case of a single component image.

Figure 12.1. A simplified DCT-based encoder.

Figure 12.2. A simplified DCT-based decoder.

JPEG supports four distinct modes of operation under which the various coding processes are defined: sequential DCT-based, progressive DCT-based, lossless, and hierarchical.

Sequential Mode

A simple sequential or "baseline" JPEG file is stored as one top-to-bottom scan of the image. In this mode, 8 x 8 sample blocks are typically ordered block by block from left to right, and by block row from top to bottom. After a block has been transformed by the forward DCT, quantized and prepared for entropy encoding, all 64 of its quantized DCT coefficients can be immediately entropy encoded and output as part of the compressed image data, thereby minimizing coefficient storage requirements.

Progressive Mode

Progressive JPEG divides the file into a series of scans , much like interlaced formats do. The first scan accounts for a very low quality setting, and is therefore very efficient both in computation and memory requirements. Subsequent scans gradually improve the quality, adding to the data already provided in previous scans, with the total computation and storage requirement for each scan being roughly the same.

The advantage of progressive JPEG is that if an image is being viewed on-the-fly as it is transmitted, one can see an approximation of the whole image very quickly, with gradual improvement of quality as one waits longer; this provides a much better renderer experience than a slow top-to-bottom display of the image. The disadvantage is the decoder is required to keep up with the pace of data received.

For the progressive DCT-based mode, 8x8 blocks are typically encoded in the same order as in the sequential (or baseline) mode, but in multiple scans through the image. This is accomplished by adding an image- sized coefficient memory buffer between the quantizes and the entropy encoder. As each block is transformed by the forward DCT and quantized, its coefficients are stored in the buffer. The DCT coefficients in the buffer are then partially encoded in each of multiple scans.

There are two procedures by which the quantized coefficients in the buffer may be partially encoded within a scan. First, only a specified band of coefficients from the zig-zag sequence need be encoded. This procedure is called spectral selection, because each band typically contains coefficients that occupy a lower or higher part of the frequency spectrum for that 8x8 block. Second, the coefficients within the current band need not be encoded to their full (quantized) accuracy within each scan. On a coefficient's first encoding, a specified number of most significant bits is encoded first.

In subsequent scans, the less significant bits are encoded in procedure called successive approximation. Either procedure (i.e., spectral selection or successive approximation) may be used separately, or they may be mixed in flexible combinations.

Except for the ability to provide progressive display, progressive JPEG and baseline JPEG are basically identical, and they work well on the same kinds of images. It is possible to convert between baseline and progressive representations of an image without any quality loss. However, a progressive JPEG file is not readable at all by a baseline-only JPEG decoder.

Hierarchical Mode

In hierarchical mode, an image is encoded as a sequence of frames of varying resolution. These frames provide reference reconstructed components that are usually needed for prediction in subsequent frames. Except for the first frame for a given component, differential frames encode the difference between source components and reference reconstructed components. The coding of the differences may be done using only DCT-based processes, only lossless processes, or DCT-based processes with a final lossless process for each component. Downsampling and upsampling filters may be used to provide a pyramid of spatial resolutions . Alternatively, the hierarchical mode can be used to improve the quality of the reconstructed components at a given spatial resolution.

Hierarchical mode offers a progressive presentation similar to the progressive DCT-based mode but is useful in multi resolution environments. Hierarchical mode also offers the capability of progressive coding to a final lossless stage.

Table 12.4. Summary of Processes supported by JPEG Decoders

Process

Type

Sample

Modes

Coding

Scans

Baseline

DCT-based

8-bit samples within each component

Sequential

Huffman coding: 2 AC and 2 DC tables

14 components, interleaved and non-interleaved

Extended

DCT-based

8 bit or 12 bit samples

Sequential or progressive

Huffman or arithmetic: 4 AC and 4 DC tables

1-4 components, interleaved and non-interleaved

Lossless

Predictive

P bit samples (2<P<16)

Sequential

Huffman or arithmetic: 4 DC tables

1-4 components, interleaved and non-interleaved

Hierarchical

Multiple frames, differential or non-differential, all the above

All the above

All the above

All the above

All the above

12.1.2.5 Sample Precision

For DCT based processes, JPEG supports two alternative sample precision: either 8 bits or 12 bits per sample. Applications that use samples with other precision can use either 8-bit or 12-bit precision by shifting their source image samples appropriately. The baseline process uses only 8-bit precision. DCT-based implementations that handle 12-bit source image samples are likely to need greater computational resources than those that handle only 8-bit source images.

12.1.2.6 Color Space

The color space is YCbCr as defined by Comite Consultatif International des Radiocommunications (CCIR) 601 (256 levels). The RGB components calculated by linear conversion from YCbCr may not be gamma corrected (i.e., gamma = 1.0). If only one component is used, that component is Y.

YCbCr (256 levels) can be computed directly from 8-bit RGB as follows:

 Y   =   0.299  R + 0.587  G + 0.114  B Cb  =  0.1687 R - 0.3313 G + 0.5    B + 128 Cr  =   0.5    R - 0.4187 G - 0.0813 B + 128 

Y, Cb, and Cr are converted from R, G and B and normalized so as to occupy the full 256 levels of a 8-bit binary encoding. More precisely:

 Y   = 256 * E'y Cb  = 256 * [ E'Cb ] + 128 Cr  = 256 * [ E'Cr ] + 128 

where the E'y, E'Cb and E'Cb are defined as in CCIR 601. Because values of E'y have a range of 0 to 1.0 and those for E'Cb and E'Cr have a range of 0.5 to +0.5, Y, Cb, and Cr are clamped to 255 when they are maximum value.

Conversely, RGB can be computed directly from YCbCr (256 levels) as follows:

 R = Y + 1.402   (Cr-128) G = Y - 0.34414 (Cb-128) - 0.71414 (Cr-128) B = Y + 1.772   (Cb-128) 
12.1.2.7 File Format

The format of a JFIF file conforms to the syntax for interchange format defined in Annex B of ISO DIS 10918-1. The syntax is a uniform structure and set of parameters for both classes of encoding processes (lossy or lossless), and for all modes of operation (sequential, progressive, lossless, and hierarchical). The various parts of the compressed image data are identified by special two-byte codes called markers. Some markers are followed by particular sequences of parameters, as in the case of table specifications, frame header, or scan header. Others are used without parameters for functions, such as marking the start and end of the image. When a marker is associated with a particular sequence of parameters, the marker and its parameters comprise a marker segment.

The data created by the entropy encoder are also segmented, and one particular marker, the restart marker, is used to isolate entropy-coded data segments. The encoder outputs the restart markers, intermixed with the entropy-coded data, at regular restart intervals of the source image data. Restart markers can be identified without having to decode the compressed data to find them. Because markers can be independently decoded, they have application-specific uses, such as parallel encoding or decoding, isolation of data corruptions, and semirandom access of entropy-coded segments.

Compressed image data consists of only one image. An image contains only one frame in the cases of sequential and progressive coding processes; an image contains multiple frames for the hierarchical mode. A frame contains one or more scans, each having interleaved or non-interleaved components. For sequential processes, a scan contains a complete encoding of one or more image components. For progressive processes, a scan contains a partial encoding of all data units from one or more image components. Components are not interleaved in progressive mode, except for the DC coefficients in the first scan for each component of a progressive frame.

There are three compressed data formats: the interchange format, the abbreviated format for compressed image data, and the abbreviated format for table-specification data.

Interchange format markers

The interchange format includes the marker segments for all quantization and entropy-coding table specifications needed by the decoding process. This guarantees that a compressed image can cross the boundary between application environments, regardless of how each environment internally associates tables with compressed image data.

Abbreviated format for compressed image data

The abbreviated format for compressed image data is identical to the interchange format, except that it does not include all (but may include some) tables needed for decoding. This format is intended for use within applications where alternative mechanisms are available for supplying some or all of the table-specification data needed for decoding.

Abbreviated format for table-specification data

This format contains only table-specification data. It is a means by which the application may install in the decoder the tables required to subsequently reconstruct one or more images.

APP0 Markers

A JFIF file uses APP0 marker segments and constrains certain parameters in the frame header.

12.1.2.8 Proprietary Extensions

JFIF provides an extensible file format. Proprietary extension and application specific information can be added using the APP0 marker segment: immediately following the JFIF APP0 marker segment may be a JFIF extension APP0 marker. These marked segments do not affect the decodability or displayability of the JFIF file. Decoders skip application-specific APP0 segments that they do not recognize.

12.1.2.9 Thumbnails Coded Using JPEG

JPEG includes thumbnail supports utilizing the JFIF extension APP0 marker length field. The thumbnail stored using one byte per pixel extension supports thumbnails stored using one byte per pixel and a color palette in the extension_data field. There is also support for thumbnails stored using three bytes per pixel.

12.1.3 Portable Network Graphics (PNG)

The Portable Network Graphics (PNG) format, available from the W3C, integrates multiple formats into a single widely accepted envelope framework supported by all the major Web-browsers [PNG]. PNG provides a patent-free replacement for GIF and can also replace many common uses of TIFF. Indexed-color, grayscale, and true-color images are supported, plus an optional alpha channel. Sample depths range from 1 to 16 bits. The premise behind PNG is that it provides a portable, legally unencumbered, well-compressed, well-specified standard for lossless bitmapped image files. PNG is designed to work well in online viewing applications as it is fully streamable with a progressive display option. PNG is robust, providing both full file integrity checking and simple detection of common transmission errors. PNG can also store gamma and chromaticity data for improved color matching on heterogeneous platforms.

GIF features retained in PNG include the following:

  • Colors : Indexed-color images of up to 256 colors.

  • Streamability : files can be read and written serially , thus allowing the file format to be used as a communications protocol for on-the-fly generation and display of images.

  • Progressive display : a suitably prepared image file can be displayed as it is received over a communications link, yielding a low-resolution image very quickly followed by gradual improvement of detail.

  • Transparency : portions of the image can be marked as transparent, creating the effect of a non-rectangular image.

  • Ancillary information : textual comments and other data can be stored within the image file.

  • Platform : Complete hardware and platform independence.

  • Lossless : Effective, 100% lossless compression.

PNG has some important features not available in GIF:

  • True Color : Images may have up to 48 bits per pixel.

  • Grayscale : Grayscale images may have up to 16 bits per pixel.

  • Full Alpha Channel : PNG supports RGBA colors and general transparency masks.

  • Image gamma information : PNG supports automatic display of images with correct brightness and contrast regardless of the machines used to originate and display the image.

  • Error Detection : PNG supports reliable, straightforward detection of file corruption.

  • Performance : Rapid initial presentation is supported in progressive display mode.

12.1.3.1 PNG Feature Overview
Color Values

Colors can be represented by either grayscale or RGB (red, green, blue) sample data. Grayscale data represents luminance; RGB data represents calibrated color information (if the cHRM chunk is present) or uncalibrated device-dependent color (if cHRM is absent). All color values range from zero (representing black) to most intense at the maximum value for the sample depth; the maximum value at a given sample depth is (2 sampledepth )1, not 2 sampledepth .

Sample values are not necessarily linear; the gAMA chunk specifies the gamma characteristic of the source device, and renderers are strongly encouraged to compensate properly. Source data with a precision not directly supported in PNG is scaled up to the next higher supported bit depth. This scaling is reversible with no loss of data, and it reduces the number of cases that decoders have to cope with.

Image Layout

Conceptually, a PNG image is a rectangular pixel array, with pixels appearing left-to-right within each scan-line, and scan-lines appearing top-to-bottom (the data may be transmitted in a different order). Three pixel types are supported:

  • Index-Color Pixels : An indexed-color pixel is represented by a single sample that is an index into a supplied palette. The image bit depth determines the maximum number of palette entries, but not the color precision within the palette.

  • Grayscale Pixels : A grayscale pixel is represented by a single sample that is a grayscale level, where zero is black and the largest value for the bit depth is white.

  • True-color Pixels : A true-color pixel is represented by three samples: red (zero = black, max = red) appears first, then green (zero = black, max = green), then blue (zero = black, max = blue). The bit depth specifies the size of each sample, not the total pixel size. Optionally , grayscale and true-color pixels can include an alpha sample.

PNG permits multi-sample pixels only with 8- and 16-bit samples, so multiple samples of a single pixel are never packed into one byte. 16-bit samples are stored with the MSB first.

Scan Lines

Scan-lines always begin on byte boundaries. When pixels have fewer than 8 bits and the scan-line width is not evenly divisible by the number of pixels per byte, the low-order bits in the last byte of each scan-line are wasted. Pixels are always packed into scan-lines so that pixels smaller than one byte never cross byte boundaries; they are packed into bytes with the leftmost pixel in the high-order bits of a byte, the rightmost in the low-order bits. Permitted bit depths and pixel types are restricted so that in all cases the packing is simple and efficient.

A filter type byte is added to the beginning of every scan-line. This byte is not considered part of the image data, but it is included in the data stream sent to the compression step.

Alpha channel

An alpha channel, representing transparency information on a per-pixel basis, can be included in grayscale and true-color PNG images. An alpha value of zero represents full transparency, and a value of (2 bitdepth )1 represents a fully opaque pixel. Intermediate values indicate partially transparent pixels that can be combined with a background image to yield a composite image (i.e., alpha is the degree of opacity of the pixel).

Alpha channels can be included with images that have either 8 or 16 bits per sample, but not with images that have fewer than 8 bits per sample. Alpha samples are represented with the same bit depth used for the image samples. The alpha sample for each pixel is stored immediately following the grayscale or RGB samples of the pixel.

The color values stored for a pixel are not affected by the alpha value assigned to the pixel. This rule is sometimes called unassociated or nonpremultiplied alpha. It is also common to store sample values premultiplied by the alpha fraction; in effect, such an image is already composited against a black background. PNG does not use premultiplied alpha.

Transparency

Transparency control is also possible without the storage cost of a full alpha channel. In an indexed-color image, an alpha value can be defined for each palette entry. In grayscale and true-color images, a single pixel value can be identified as being transparent. These techniques are controlled by the tRNS ancillary chunk type.

Filtering

PNG allows the image data to be filtered before it is compressed. Filtering can improve the compressibility of the data. The filter step itself does not reduce the size of the data. All PNG filters are strictly lossless.

PNG defines several different filter algorithms, including the option not to perform filtering. The filter algorithm is specified for each scan-line by a filter type byte that precedes the filtered scan-line in the precompression data stream. An intelligent encoder can switch filters from one scan-line to the next. The method for choosing which filter to employ is up to the encoder.

Interlaced Data Order

A PNG image can be stored in interlaced order to allow progressive display. The purpose of this feature is to allow images to "fade in" when they are being displayed on-the-fly. Interlacing slightly expands the file size on average, but it gives the user a meaningful display much more rapidly . Note that decoders are required to be able to read interlaced images, whether or not they actually perform progressive display.

With interlace method 0, pixels are stored sequentially from left to right, and scan-lines are stored sequentially from top to bottom (no interlacing). Interlace method 1, known as Adam7 after its author, Adam M. Costello, consists of seven distinct passes over the image. Each pass transmits a subset of the pixels in the image. The pass in which each pixel is transmitted is defined by replicating the following 8 x 8 pattern over the entire image, starting at the upper left corner:

 1 6 4 6 2 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 3 6 4 6 3 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 

Within each pass, the selected pixels are transmitted left to right within a scan-line, and selected scan-lines are transmitted sequentially from top to bottom. For example, pass 2 contains pixels 4, 12, 20, etc. of scan-lines 0, 8, 16, etc. (numbering from 0,0 at the upper left corner). The last pass contains the entirety of scan-lines 1, 3, 5, and so on.

The data within each pass is laid out as though it were a complete image of the appropriate dimensions. For example, if the complete image is 16 x 16 pixels, then pass 3 contains two scan-lines, each containing four pixels. When pixels have fewer than 8 bits, each such scan-line is padded as needed to fill an integral number of bytes. Filtering is done on this reduced image in the usual way, and a filter type byte is transmitted before each of its scan-lines. The transmission order is defined so that all the scan-lines transmitted in a pass have the same number of pixels; this is necessary for proper application of some of the filters.

Gamma Correction

PNG images can specify, via the gAMA chunk, the gamma characteristic of the image with respect to the original scene. PNG rendering routines are strongly encouraged to use this information, plus information about the display device they are using and room lighting, to present the image to the renderer in a way that reproduces what the image's original author saw as closely as possible. Gamma correction is not applied to the alpha channel. Alpha samples always represent a linear fraction of full opacity.

For high-precision applications, the exact chromaticity of the RGB data in a PNG image can be specified via the cHRM chunk, allowing more accurate color matching than gamma correction alone can provide.

Text Strings

A PNG file can store text associated with the image, such as an image description or copyright notice. Keywords are used to indicate what each text string represents. ISO 8859-1 (Latin-1) is the character set recommended for use in text strings. This character set is a superset of 7-bit ASCII.

Character codes not defined in Latin-1 should not be used, because they have no platform-independent meaning. If a non-Latin-1 code does appear in a PNG text string, its interpretation varies across platforms and decoders. Some systems might not even be able to display all the characters in Latin-1, but most modern systems can.

12.1.3.2 File Structure

A PNG file consists of an 8-byte PNG signature, 0x89504E470D0A1A0A, followed by a series of chunks. Chunks that are necessary for successful display of the file's contents are called critical chunks . A decoder encountering an unknown chunk in which the ancillary bit is 0 indicates to the user that the image contains information it cannot safely interpret. The image header chunk (IHDR) is an example of a critical chunk.

PNG file signature

The first eight bytes of a PNG file always contain the decimal values 137, 80, 78, 71, 13, 10, 26, and 10. This signature indicates that the remainder of the file contains a single PNG image, consisting of a series of chunks beginning with an IHDR chunk and ending with an IEND chunk.

Chunk format

Each chunk consists of length, type, data and CRC.

Length

A 4-byte unsigned integer giving the number of bytes in the chunk's data field. The length counts only the data field, not itself, the chunk type code, or the CRC. Zero is a valid length. Although encoders and decoders should treat the length as unsigned, its value cannot exceed (2 31 )-1 bytes.

Chunk Type

A 4-byte chunk type code. For convenience in description and in examining PNG files, type codes are restricted to consist of uppercase and lowercase ASCII letters (A-Z and a-z, or 65-90 and 97-122 decimal). However, encoders and decoders treat the codes as fixed binary values, not character strings. For example, it would not be correct to represent the type code IDAT by the EBCDIC equivalents of those letters.

Chunk Data

The data bytes appropriate to the chunk type, if any. This field can be of zero length. The chunk data length can be any number of bytes up to the maximum; therefore, chunks may not be aligned on any boundaries larger than bytes. Chunks can appear in any order, subject to the restrictions placed on each chunk type. Multiple chunks of the same type can appear, but only if specifically permitted for that type.

CRC

A 4-byte CRC calculated on the preceding bytes in the chunk, including the chunk type code and chunk data fields, but not including the length field. The CRC is always present, even for chunks containing no data (see CRC algorithm below).

Chunk Naming Conventions

Chunk type codes are assigned so that a decoder can determine some properties of a chunk even when it does not recognize the type code. These rules are intended to allow safe, flexible extension of the PNG format, by allowing a decoder to decide what to do when it encounters an unknown chunk. The naming rules are not normally of interest when the decoder does recognize the chunk's type.

CRC Algorithm

Chunk CRCs are calculated using standard CRC methods with pre- and post-conditioning, as defined by ISO 3309 [ISO-3309] or ITU-T V.42 [ITU-V42]. The CRC polynomial employed is

x 32 + x 26 + x 23 + x 22 + x 16 + x 12 + x 11 + x 10 + x 8 + x 7 + x 5 + x 4 + x 2 + x + 1

The 32-bit CRC register is initialized to all 1's, and then the data from each byte is processed from the least significant bit to the most significant bit. After all the data bytes are processed , the CRC register is replaced with its ones complement is taken. This value is transmitted or stored in the file with the MSB first. For the purpose of separating into bytes and ordering, the least significant bit of the 32-bit CRC is defined to be the coefficient of the x 31 term .

12.1.3.3 Critical Chunks

All compliant implementations are required to understand and successfully render the standard critical chunks. A valid PNG image contains an IHDR chunk, one or more IDAT chunks, and an IEND chunk.

IHDR Image Header

The IHDR chunk is the first chunk in the file. It contains the width and height of the image in pixels; each dimension is 32 bits allowing values from 0 to (2 31 )1, accommodating languages that have difficulty with unsigned 4-byte values. This chunk also specifies the bit depth; valid values are 1, 2, 4, 8, and 16, although not all values are allowed for all color types. The chunk also specifies the color type code represented with the following bit-mask: 1 (palette used), 2 (color used), and 4 (alpha channel used). Valid values are 0, 2, 3, 4, and 6.

Encoding and decoding information is also included in the IHDR. The compression method is specified using a single-byte integer that indicates the method used to compress the image data. The filter method is specified using a single-byte integer that indicates the preprocessing method applied to the image data before compression. The interlace method is specified using a single-byte integer that indicates the transmission order of the image data.

PLTE Palette

The PLTE chunk contains from 1 to 256 palette entries, each a three byte RGB form. The number of entries is determined from the chunk length; a chunk length not divisible by 3 is an error. This chunk is required for some color types, and is optional for others. See the PNG specifications for details.

IDAT Image Data

The IDAT chunk contains the actual image data, which is the output data stream of the compression algorithm. Create this data requires three steps (reading requires reversing the process):

  1. Begin with image scan-lines represented as described in Image layout; the layout and total size of this raw data are determined by the fields of IHDR.

  2. Filter the image data according to the filtering method specified by the IHDR chunk.

  3. Compress the filtered data using the compression method specified by the IHDR chunk.

If there are multiple IDAT chunks, they appear consecutively with no other intervening chunks. The compressed data stream is then the concatenation of the contents of all the IDAT chunks. The compressed data stream may be divided into multiple IDAT chunks; IDAT chunk boundaries have no semantic significance and can occur at any point in the compressed data stream.

IEND Image Trailer

The IEND chunk appear last. It marks the end of the PNG data stream. The chunk's data field is empty.

12.1.3.4 Ancillary Chunks

All ancillary chunks are optional, in the sense that encoders need not write them and decoders can ignore them. However, encoders are encouraged to write the standard ancillary chunks when the information is available, and decoders are encouraged to interpret these chunks when appropriate and feasible .

  • bKGD Background color : The bKGD chunk specifies a default background color to present the image against. Note that renderers are not bound to honor this chunk; a renderer can choose to use a different background.

  • cHRM Primary chromaticities and white point : Applications that need device-independent specification of colors can use the cHRM chunk to specify the 1931 CIE x,y chromaticities of the red, green, and blue primaries used in the image, and the referenced white point.

  • gAMA Image gamma : The gAMA chunk specifies the gamma of the image producer (e.g., camera), and thus the gamma of the image with respect to the original scene.

  • hIST Image histogram : The hIST chunk gives the approximate usage frequency of each color in the color palette. A histogram chunk can appear only when a palette chunk appears. The histogram may help selecting a subset of the colors for display for unsupported colors.

  • pHYs Physical pixel dimensions : The pHYs chunk specifies the intended pixel size or aspect ratio for display of the image.

  • sBIT Significant bits : To simplify decoders, PNG specifies that only certain sample depths can be used, and that these values be scaled to the full range of values at the sample depth. The sBIT chunk is provided to store the original number of significant bits, allowing decoders to recover the original data losslessly even if for sample depths not directly supported by PNG.

  • tEXt Textual data : Textual information that the encoder wishes to record with the image can be stored in tEXt chunks. Each tEXt chunk contains a keyword and a text string.

  • tIME Image last-modification time : The tIME chunk gives the time of the last image modification (not the time of initial image creation). It is intended for use as an automatically-applied time stamp that is updated whenever the image data is changed. It is recommended that tIME not be changed by PNG editors that do not change the image data.

  • tRNS Transparency : The tRNS chunk specifies that the image uses simple transparency: either alpha values associated with palette entries (for indexed-color images) or a single transparent color (for grayscale and true-color images). Compared to full alpha channel, simple transparency requires less storage space and is sufficient for many common scenarios.

  • zTXt Compressed textual data : The zTXt chunk contains textual data, just as tEXt does; however, zTXt takes advantage of compression. zTXt and tEXt chunks are semantically equivalent, but zTXt is recommended for storing large blocks of text.

Table 12.5. Constraints Associated with Various PNG Chunks

Chunk Name

Critical

Multiple

Ordering Constraints

IHDR

Yes

No

Must be first

PLTE

Yes

No

Before IDAT

IDAT

Yes

Yes

Multiple IDATs must be consecutive

IEND

Yes

No

Must be last

cHRM

No

No

Before PLTE and IDAT

gAMA

No

No

Before PLTE and IDAT

sBIT

No

No

Before PLTE and IDAT

bKGD

No

No

After PLTE; before IDAT

hIST

No

No

After PLTE; before IDAT

tRNS

No

No

After PLTE; before IDAT

pHYs

No

No

Before IDAT

tIME

No

No

None

tEXt

No

Yes

None

zTXt

No

Yes

None

12.1.4 MNG

The Multiple image Network Graphics (MNG) is a multi-image member of the PNG family [MNG]. It is designed for animations, slide shows, or complex still frames comprised of multiple PNG single-image data streams.

An MNG data stream describes a sequence of zero or more single frames, each of which can be composed of zero or more embedded images. An MNG-LC frame normally contains a two-dimensional image or a two-dimensional layout of smaller images. It could also contain three-dimensional "voxel" data arranged as a series of two-dimensional planes (or tomographic slices), each plane being represented by a PNG data stream.

The Low-Complexity (MNG-LC) version of this format is a profile that includes only a subset of the MNG features, and is often the preferred implementation friendly MNG [MNGLC]. As an example, whereas in full MNG embedded images can be PNG or JPEG Network Graphics (JNG) data streams [JNG], MNG-LC data streams do not require support for JNG images [MNGLC].

12.1.4.1 File Structure

The MNG and MNG-LC file formats use the same chunk structure that is defined in the PNG specification and shares other features of the PNG format; any MNG-LC decoder is required to be able to decode valid PNG data streams. Unlike PNG, however, fields within the MNG chunk data structure have default values and can therefore be omitted. Such omission is permitted only when explicitly stated in the specification for the particular field or chunk. If a field is omitted, all the subsequent fields in the chunk are also omitted and the chunk length is shortened .

Similar to PNG, an MNG data stream consists of an 8-byte signature, 0x8a4d4e470d0a1a0a, followed by a series of chunks. As in PNG files, each chunk consists of a 4-byte data length field, a 4-byte chunk type code, data, and a CRC. The first chunk, the header, is followed by frame definitions and layer definitions. Each frame consists of one or more layers composited against whatever was already on the display. Each layer is an embedded potentially visible image, where the foreground and background layers are special purpose layers .

12.1.4.2 Critical MNG Chunks
Header Chunk

This chunk, the MHDR chunk, is always first in all MNG data streams except for those that consist of a single PNG or JNG data stream with a PNG or JNG signature. It contains the following 7 unsigned integer fields, each 4 bytes long: frame-width, frame-height, ticks -per-second, nominal-layer-count, nominal-frame-count, nominal-play-time, and simplicity-profile. In MNG-LC, decoders can ignore the optional, namely informative, fields of nominal-frame-count, nominal-layer-count, nominal-play-time, and simplicity-profile.

  • Frame Dimensions : The frame-width and frame-height fields give the intended display size (measured in pixels) and provide the default clipping boundaries. These should be set to zero if the MNG data stream contains no visible images.

  • Ticks : The ticks-per-second field is the unit of interframe delay and time-out; when the data stream contains a sequence of images this field is non-zero . When the data stream contains exactly one frame, this field is zero.

When ticks-per-second is zero, the length of a tick is infinite, and decoders ignore any attempt to define interframe delay, time-out, or any other variable that depends on the length of a tick. If the frames are intended to be displayed one at a time under user control, such as a slide show or a multi-page FAX, the tick length can be set to any positive number, the interframe delay could be set to infinite, and the time-out set to zero. MNG renderers typically display the first frame in the data stream without renderer intervention.

When ticks-per-second is nonzero, and there is no other information available about interframe delay, renderers should display the sequence of frames at the rate of one frame per tick. The frame count contains the number of frames that would be displayed.

  • Profiles : The simplicity-profile field is used to set the format expectation for the data following the header. When bit 0 of the simplicity-profile field is zero, the simplicity (or complexity) of the MNG data stream is unspecified, and all bits of the simplicity profile are set to zero. The simplicity profile is not allowed to be zero in MNG-LC data streams.

If the simplicity profile is nonzero, it can be regarded as a 32-bit profile, with bit 0 (the least significant bit) being a profile-validity flag, bit 1 being a simple MNG flag, bit 2 being a complex MNG flag, bits 3, 7, and 8 being transparency flags, bit 4 being a JNG flag, bit 5 being a Delta-PNG flag, and bit 9 being a stored object buffers flag. Bit 6 is a validity flag for bits 7, 8, and 9. These three flags mean nothing if bit 6 is zero.

If a simplicity profile bit is zero, the corresponding feature is guaranteed to be absent or have no effect on the appearance of any frame. If a bit is one, the corresponding feature may be present in the MNG data stream.

When bit 1 is zero (simple MNG features are absent), the data stream does not contain the DEFI, FRAM, MAGN, or global PLTE and tRNS chunks, and filter method 64 is not used in any embedded PNG data stream. Bits 10 through 15 of the simplicity profile are reserved for future MNG versions. Bits 16 through 30 are available for private test or experimental versions. The MSB (bit 31) is zero.

  • Transparency Profiles : The transparency profiles are defined:

    • Background transparency : The application is responsible for supplying a background color or image against which the MNG background layer is composited, and the application should refresh the entire MNG frame whenever the application's background scene changes.

    • Transparency is absent or can be ignored : Either the MNG or PNG tRNS chunk is not present and no PNG or JNG image has an alpha channel, or if they are present they have no effect on the final appearance of any frame and can be ignored.

    • Semitransparency is absent : The JDAA chunk is not present. If the MNG or PNG tRNS chunk is present or if any PNG or JNG image has an alpha channel, they only contain the values 0 and the maximum (opaque) value.

    • Background transparency is absent : The first layer of every segment fills the entire frame with opaque pixels, and that nothing following the first layer causes any frame to become transparent. Whatever is behind the first layer does not show through.

PLTE and tRNS Global Palette

The MNG PLTE chunk is identical to the PNG PLTE chunk; it provides a global palette that is inherited by PNG data streams that contain an empty PLTE chunk. The tRNS chunk is identical to the PNG tRNS chunk; it provides a global transparency array that is inherited along with the global palette by PNG data streams that contain an empty PLTE chunk. If the global PLTE chunk is not present, each indexed-color PNG in the data stream supplies its own PLTE (and tRNS, if it has transparency) chunks. The global PLTE chunk is not permitted in the Very Low Complexity (MNG-VLC) profile.

IHDR, IEND, PNG Chunks

A group of these chunks represent a PNG image within the MNG file. The IHDR and IEND chunks and any chunks between them are as described above for the PNG format, with some MNG-specific extensions not available in MNG-VLC. These extensions include the following:

  • Adaptive Filtering : An additional PNG filter method, Adaptive filtering, features five basic types and intrapixel differencing. An intrapixel differencing transformation is introduced, which relies on integer arithmetic in sufficient precision to hold intermediate results, and the result is calculated modulo 2 sampledepth . Intrapixel differencing (subtracting the green sample) is only done for color types 2 and 6, and only when the filter method is 64. This filter method is not permitted in images with color types other than 2 or 6.

  • Inheritence : If a global PLTE chunk appears in the top-level MNG data stream, the PNG data stream can have an empty PLTE chunk to direct that the global PLTE and tRNS data be used. If an empty PLTE chunk is not present, the data is not inherited. MNG applications that re-create PNG files replace the global PLTE chunk instead of the empty one in the output PNG file, along with the global tRNS data if it is present. The global tRNS data can be subsequently overridden by a tRNS chunk in the PNG data stream. It is an error for the PNG data stream to contain an empty PLTE chunk when the global PLTE chunk is not present or has been nullified.

  • Excluded PNG chunks : The PNG oFFs and pHYs chunks and any chunks that attempt to set the pixel dimensions or the drawing location is ignored by MNG renderers. Similarly, the PNG gIFg, gIFt, and gIFx chunks are ignored by renderers. The ignored chunks are copied by editors to ensure that, when the individual PNG images are separated from the MNG aggregate, these chunks are available and can be activated.

JHDR, JNG Chunks

A group of these chunks represent a JNG image within the MNG file. The extensions specified for PNG chunks apply for JNG chunks as well. MNG-LC applications are not required to support these chunks; if they are not supported, they are simply ignored.

FRAM Frame Definitions

The FRAM chunk provides information that a decoder needs for generating frames and interframe delays. The FRAM parameters (see Table 12.6) govern how the decoder is to behave when it encounters a FRAM chunk, or an embedded image. The FRAM chunk also delimits subframes. If bit 1 of the MHDR simplicity profile is 0 and bit 0 is 1, the FRAM chunk is not present.

A FRAM chunk is a subframe delimiter that changes FRAM parameters for the subframe immediately following the FRAM chunk; empty FRAM chunks are delimiters that do not change FRAM properties. Non-empty FRAM chunks contain a framing-mode byte, an optional name string, a zero-byte separator, and a number of additional optional fields.

Table 12.6. FRAM Parameters

Name

Size

Value

Description

Framing-mode

1 byte

Do not change framing mode.

1

No background layer is generated, except for one ahead of the very first foreground layer. The interframe delay is associated with each foreground layer in the subframe.

2

No background layer is generated, except for one ahead of the very first image. The interframe delay is associated only with the final layer in the subframe. A zero interframe delay is associated with the other layers in the subframe.

3

A background layer is generated ahead of each foreground layer. An interframe delay is associated with each foreground layer, and a zero delay is associated with each background layer.

Framing-mode

1 byte

4

The background layer is generated only ahead of the first foreground layer in the subframe. The interframe delay is associated only with the final foreground layer in the subframe. A zero interframe delay is associated with the background layers, except when there is no foreground layer in the subframe, in which case the interframe delay is associated with the sole background layer.

Subframe-name

Varies

Latin-1

Can be omitted; if so, the subframe is nameless.

Separator

1 byte

Null

Must be omitted if the subsequent fields are also omitted.

Change-interframe-delay

1 byte

This field and all subsequent fields can be omitted if no frame parameters except the framing mode or the subframe name are changed.

No

1

Yes, for the upcoming subframe only.

2

Yes, also reset default.

Change-time-out-and-termination

1 byte

This field can be omitted only if the previous field is also omitted.

No

1

Deterministic, for the upcoming subframe only.

2

Deterministic, also reset default.

3

Decoder-discretion, for the upcoming subframe only.

4

Decoder-discretion, also reset default.

5

User-discretion, for the upcoming subframe only.

6

User-discretion, also reset default.

7

External-signal, for the upcoming subframe only.

8

External-signal, also reset default.

Change-layer-clipping-boundaries

1 byte

This field can be omitted only if the previous field is also omitted.

No.

1

Yes, for the upcoming subframe only.

2

Yes, also reset default.

Change-sync-id-list

1 byte

This field can be omitted only if the previous field is also omitted.

No.

1

Yes, for the upcoming subframe only.

2

Yes, also reset default list.

Interframe-delay

4 bytes

This field must be omitted if the change-interframe-delay field is zero or is omitted. The range is [0.2 31 -1] ticks.

time-out

4 bytes

This field must be omitted if the change-time-out-and-termination field is zero or is omitted. The range is [0..2 31 -1]. The value 2 31 -1 (0x7fffffff) ticks represents an infinite time-out period.

Layer-clipping-boundary-delta-type

1 byte

This and the following four fields must be omitted if the change-layer-clipping-boundaries field is zero or is omitted.

Layer clipping boundary values are given directly.

1

Layer clipping boundaries are determined by adding the FRAM data to the values from the previous subframe.

Left-layer-cb

4 bytes

This signed integer may also specify delta-left-layer-cb.

Right-layer-cb

4 bytes

This signed integer may also specify delta-right-layer-cb.

Top-layer-cb

4 bytes

This signed integer may also specify delta-top-layer-cb.

Bottom-layer-cb

4 bytes

This signed integer may also specify delta-bottom-layer-cb.

Sync-id

4 bytes

This unsigned integer must be omitted if change-sync-id-list=0 and can be omitted if the new list is empty; repeat until all sync-ids have been listed. The range is [0..2 31 -1].

  • Framing mode 1 : When framing-mode is 1, the decoder waits until the interframe delay for the previous frame has elapsed before displaying each image. Each foreground layer is a separate subframe and frame.

  • Framing mode 2 : Framing mode 2 is similar to mode 1, except that the interframe delay occurs between subframes delimited by FRAM chunks rather than between individual layers. All of the foreground layers between consecutive FRAM chunks make up a single subframe. Typically, the interframe delay is nonzero, and multiple layers are present, so each frame is a single subframe composed of several layers. When the interframe delay is zero, the subframe is combined with subsequent subframes until one with a nonzero interframe delay is encountered .

Decoders wait until the interframe delay for the previous frame has elapsed before displaying the frame. Renderers are expected to display all of the images in a frame at once, if possible, or as fast as can be managed, without clearing the display or restoring the background.

  • Framing mode 3 : In framing mode 3 a background layer is generated and displayed immediately before each image layer is displayed. Otherwise, framing mode 3 is identical to framing mode 1. Each foreground layer together with its background layer make up a single subframe and frame. When the background layer is transparent or does not fill the clipping boundaries of the image layer, the application is responsible for supplying a background color or image against which the image layer is composited, and if the MNG is being displayed against a changing scene, the application should refresh the entire MNG frame against a new copy of the background layer whenever the application's background scene changes.

  • Framing mode 4 : In framing mode4 the background layer is generated and displayed immediately before each frame, i.e., after each FRAM chunk, with no interframe delay before each image. The decoder waits until the interframe delay for the previous frame has elapsed before displaying the background layer. Otherwise, framing mode 4 is identical to framing mode 2. All of the foreground layers between consecutive FRAM chunks, together with one background layer, make up a single subframe.

BACK Background

The first layer displayed by a renderer is always a background layer that fills the entire frame. The BACK chunk provides a background that the renderer can use for this purpose (or must use, if it is mandatory). If it is not mandatory the renderer can choose another background if it wishes. If the BACK chunk is not present, or if the background is not fully opaque or has been clipped to less than full frame, the MNG renderer provides or completes its own background layer for the first frame. Each layer after the first is composited over the layers that precede it, until a FRAM chunk with framing mode 3 or 4 causes another background layer to be generated.

The BACK chunk suggests or mandates a background color against which transparent, clipped, or less-than -full-frame images can be displayed. This information is used whenever the application subsequently needs to insert a background layer. Three 2-byte RGB values are used to specify the background color. A mandatory_background flag is used to specify whether the background values are mandatory or recommended.

Renderers are expected to composite every foreground layer against a fresh copy of the background, when the framing mode given in the FRAM chunk is 3, and to composite the first foreground layer of each subframe against a fresh copy of the background, when the framing mode is 4. Also, when the framing mode is 3 or 4 and no foreground layer appears between consecutive FRAM chunks, a background layer alone is displayed as a separate frame.

The images and the background are both clipped to the subframe boundaries given in the FRAM chunk. Anything outside these boundaries is inherited from the previous subframe. If the background layer is transparent and the subsequent foreground layers do not cover the transparent area with opaque pixels, the application's background becomes reexposed in any uncovered pixels within the subframe boundaries.

TERM Termination Action

The TERM chunk suggests how the end of the MNG data stream should be processed when a MEND chunk is found. The TERM chunk, if present, appears either immediately after the MHDR chunk or immediately prior to a SEEK chunk. Only one TERM chunk is permitted in an MNG data stream.

12.1.4.3 Non-Critical MNG Chunks
LOOP, ENDL, SAVE, SEEK, and TERM chunks

Although these are listed as critical, decoders are only required to recognize them but they are allowed to ignore them. Specifically, even though SAVE and SEEK are both listed as critical, they can be ignored by all decoders reading files or data streams sequentially.

eXPI Export Image

The eXPI chunk takes a snapshot of an image, associates the name with that snapshot, and makes the name available to the outside world (like a scripting language). Multiple instances of the eXPI chunk are permitted in an MNG data stream, and they need not have different values of snapshot-id.

pHYg Physical Pixel Size (Global)

Conceptually, an MNG renderer that processes the pHYg chunk composites each image into a full-frame layer, then applies the pHYg scaling to the layer, and finally composite the scaled layer against the frame. MNG data streams can include both the PNG pHYs chunk (either at the MNG top level or within the PNG and JNG data streams) and the MNG pHYg chunk (only at the MNG top level), to ensure that the images are properly displayed either when displayed by an MNG renderer or when extracted into a series of individual PNG or JNG data streams and then displayed by a PNG or JNG application. The pHYs and pHYg chunks would normally contain the same values, but this is not necessary. The MNG pHYg chunk is identical in syntax to the PNG pHYs chunk, but it applies to complete full-frame MNG layers and not to the individual images within them.

12.1.5 Run Length Encoding

Run length encoding is an old format for encoding graphic information. It encodes a sequence or run of consecutive pixels of the same color (such as black or white) as a single code-word. For example, the sequence of pixels 77 77 77 77 77 77 77 could be coded as 7 77 (for seven 77s).

Run length encoding can work well for bi-level images (e.g., black and white text or graphics) and for 8 bit images, particularly images such as cell animations that contain many runs of the same color. Run length encoding does not work well for high-resolution 24 bit natural images because runs of the same color are not that common.

12.1.6 Vector Quantization

The basic idea of Vector Quantization (VQ) based image compression is to divide the image up into blocks (4 x 4 pixels in YUV space for Indeo and Cinepak). Typically, some blocks (hopefully many) are similar to other blocks although usually not identical. The encoder identifies a class of similar blocks and replaced these with a generic block representative of the class of similar blocks. The encoder encodes a lookup table that maps short binary codes to the generic blocks. Typically, the shortest binary codes represent the most common blocks in the image.

The VQ decoder uses the lookup table to assemble an approximate image comprised of the generic blocks in the lookup table. This approach is inherently a lossy compression process because the original blocks are replaced with a best match generic approximation.

The encoding process is computationally intensive because the encoder accumulates statistics on the frequency of blocks and calculate the similarity of blocks to build the lookup table.

The decoding process is very light and fast because it is lookup table based. In VQ, the lookup table may be called a codebook . The binary codes that index into the table may be called code-words.

Higher compression is achieved by making the lookup table smaller, fewer classes of similar blocks in the image. The quality of the reproduced approximate image degrades as the lookup table becomes smaller. VQ is prone to blocking artifacts as compression is increased.

VQ is an entire sub-field in signal and image processing. It goes well beyond the brief description above and is applied to other uses than video compression.

12.1.7 Portable Font Resource (PFR)

To enable interpretability, downloadable applications use compact standard portable font files. The portable font file format [PFR] can be stored in ROM, disks or transferred around the network. It is designed to minimize the size of the data representation and to optimize rendering performance.

A scalable outline font representation defines one or more mappings from a 16-bit character code to the glyph corresponding to the specified character code, which is compatible with the UNICODE standard. Each mapping from a set of character codes to their corresponding glyphs is considered to be one font. This is sufficient for simple writing systems in which the selection of a font and character code uniquely defined the glyph to be rendered or displayed.

Portable fonts are defined in terms of glyphs. A glyph shape is an expression used to specify how a glyph is to be rendered. A glyph shape representation is a physical embodiment of a glyph shape (e.g., a scalable outline or a bitmap image). A glyph program string is a sequence of bytes used to define one glyph shape representation. A glyph program string may consist of a scalable outline or the actual bitmap image of the glyph at a particular size. A bitmap is a method of defining the image of a glyph by specifying the value of each pixel. A bitmap image is always specific to one size of one glyph. A compound glyph program string is a sequence of bytes that defines a scalable glyph in terms of one or more simple glyph program strings. It is commonly used to construct accented glyphs and fractions, thereby reducing the size of the font.

Rendering glyphs at low resolution is difficult due to the relatively large effect of rounding. Additional data is provided with each physical font that may be used by rendering software to improve consistency at low resolutions. This data, commonly referred to as hints, typically includes definitions of standard stroke weight and vertical alignment zones.

A PFR file is composed of the following sections:

  • Header : the header contains global information about the PFR and the fonts it contains.

  • Logical font directory : the logical font directory consists of a table of pointers to the logical fonts contained within the PFR.

  • Logical font section : the logical font section contains the logical font records themselves . Each logical font record defines the transformation (size, oblique effect, condense, expand) to be applied to a physical font. It therefore represents an instance of a physical font.

  • Physical font section : the physical font section consists of a set of physical font records. Each physical font record contains information about one physical font contained within the PFR including a table of character codes defined for that physical font. A physical font record may optionally be immediately followed by bitmap character table records associated with it.

  • Glyph program strings : the glyph program strings section contains the shape definition of each glyph defined within the font. Both outline and bitmap image shapes are defined by glyph program strings. Glyph program strings are shared across all physical fonts within a PFR

  • Trailer : the trailer contains global information about the PFR and its fonts.



ITV Handbook. Technologies and Standards
ITV Handbook: Technologies and Standards
ISBN: 0131003127
EAN: 2147483647
Year: 2003
Pages: 170

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net