Types of Image Formats

Raster vs. Vector

One of the most fundamental distinctions between image formats is whether the image information is represented in raster form or vector form ( Figure 22-1 ). Raster images describe a picture by dividing the image into square or rectangular cells , called pixels, and specifying the color of each pixel. Smaller pixels can represent more detail, but this obviously requires a larger number of pixels to represent the entire image. The resolution of an image describes the size of the cells, usually specified in dpi, or dots per inch. An image that has 300 dpi contains 300 cells in one linear inch, which means 90,000 cells in every square inch.

Figure 22-1. The difference between raster and vector images.

Typically an image displayed on a monitor has about 96 dots per inch. In contrast, the a traditional film photograph can have upwards of 2000 dots per inch, depending on the grain of the film used. Digital cameras create raster images with an intermediate resolution of 200 “400 dots per inch.

If a raster image could be exemplified by a photograph, then a vector image could be described as an "Etch-a-Sketch" drawing. Vector images are specified not by the colors of contiguous pixels, but by drawing commands such as "Move to this point," "Draw a line to this point," "Draw a circle here," and "Draw a word using this font." The major advantage of vector images over raster images is that they can be made smaller or larger and still retain perfect detail.

Not every raster image can be easily expressed as a vector image. For example, a typical photograph is not well described in terms of sequential drawing commands. However, every vector image can be converted into a raster image at a certain resolution. The higher the resolution, the more accurate the rendition as a raster image.

Ultimately, raster and vector are just two different ways of describing image information, and which method is the better depends on the type of image being described.

What about Compression?

In the previous section, we noted that a 300-dpi raster image contains 90,000 pixels in every square inch. For a large and detailed image, the size of an image file can balloon to many megabytes. Fortunately, there are patterns in the colors of the pixels: many pixels have the same color, and similarly colored pixels are often adjacent. Certain image formats exploit these patterns to compress the image data, often reducing the file size 10 “20 times. There are actually two ways of compressing image data: lossless and lossy. Lossless compression does not reduce file size quite as much, but it retains the exact colors of each pixel when it is decompressed. Lossy compression accepts some modification of pixel colors or brightness, in return for increased compression.

As we saw in raster vs. vector, the type of image is the factor that determines the better compression method. Lossy compression works well for photographs and images that have swaths of smoothly varying color with few sharp edges. Lossless compression is preferable for clipart-type images that contain only a few distinct colors and have many edges with high contrast.

Color Palettes

Another distinction between image types is the number of colors they contain. Icons and clipart often involve no more than a few tens of different colors, whereas photographs contain smoothly varying hues that must be very closely approximated. This requirement pushes up the number of colors used in such an image; the most common types are High Color (65,536 colors) and True Color (16 million colors.) Such images specify color by means of RGB (red-green-blue) triplets. Each component of the triplet describes the relative intensity of the three primary colors or light: red, green, and blue. True Color images use 8 bits of precision, or 256 possible values, for each of the three color channels.

If the number of distinct colors is small, however, it is a waste to store the entire triplet for each pixel. Instead, the image contains an index of colors, called a palette.

Each triplet that occurs is assigned a single number, and that number is used to represent the pixel color. It's much like a paint-by-number kit you may have used as a child. For a large image, the savings in going from three bytes to one for each pixel can really add up.

A Smorgasbord: GIF, JPEG, PNG, TIFF, XPM, Etc.

There is a wide variety of image formats in common use, and each has its strengths and weaknesses. The rise of the World Wide Web and the widespread use of Web browsers has driven the popularity of threee image formats in particular: GIF (Graphics Interchange Format), JPEG (Joint Photographics Experts Group ), and PNG (Portable Network Graphics). These three formats alone have support in all the major Web browsers. We'll also discuss a number of other formats that you may commonly encounter.

XPM (X pixmap) is a format commonly used for icons on Linux and other Unix systems that use the X- windows system. An XPM file is actually a snippet of C code conforming to a certain format. It can be included in the source code of an application, or it can be read in dynamically using Xlib. It is an uncompressed raster format and consequently can become very large for large images.

BMP (bitmap) is a format introduced in early versions of Windows. Like XPM, it is a lossless raster format and does not include any compression. It can use a palette or can be stored in True Color. Because the representation of the data in a BMP file is so similar to its required visual representation, encoding and decoding BMP images is a trivial task. However, the files can often be quite large.

GIF (Graphics Interchange Format) is one of the first standard image formats. It was originally used for exchanging graphics files on Compuserve's bulletin boards , before the days of the Internet. It is an uncompressed raster format and requires a palette of no more than 256 distinct colors. The low number of allowed colors keeps GIF from being an acceptable format for many photographic images. The image data is compressed using the very efficient LZW compression algorithm, so GIF files can achieve extremely small sizes (a few tens of bytes) when applied to small, icon-like images.

GIF images also support multiple images in a single file. GIF viewers can display these images in sequence, producing a crude animation. This has become very popular on the Web as an attention- grabbing feature, often used in advertisements.

A major drawback of the GIF format is that the LZW compression scheme is patented by Unisys. Periodically Unisys' corporate lawyers have surfaced, threatening to sue Web site operators who use GIFs on their sites and challenging them to prove that the GIFs were created using a licensed tool. The inherent limitations of GIF files, combined with the encumbrances and uncertainty so generously provided by the LZW patent, have led to the development of an alternative file format, known as PNG. The Burn All GIFs Web site is dedicated to promoting the use of PNG images rather than GIF images and to encouraging Web site operators to use the new format. The URL is provided in the references at the end of this chapter.

PNG (Portable Network Graphics) is another raster, lossless image format that was designed as a patent-free successor to GIF. PNG corrects some of the deficiencies of the GIF format, including the limitation to a 256-color palette, and increases the number of compression options. PNG was specifically designed not to be animated, that is, PNG files do not contain multiple frames in a single file. Netscape Navigator and Internet Explorer both partially support PNGs, enough so that most GIFs can be replaced with PNGs with no impact to the end user .

TIFF (Tagged Image File Format) is more of a specification than a specific format. TIFF files support multiple chunks of data within a file, and each chunk has a "tag" associated with it that describes its functions. Some tags are universally recognized, and applications are free to insert their own types of tags. On occasion, this can confuse a second application that may interpret the custom tag as one of its own custom tags. TIFF provides a high degree of flexibility, but I have experienced some application support problems while using this format.

Even the venerable and versatile PostScript language is sometimes used as an image format. EPS (Encapsulated PostScript) can be loosely defined as a set of PostScript commands that leave the interpreter state unchanged after they are executed (hence "Encapsulated"). Any of the drawing operations supported by PostScript can be included within an EPS file, and it is the only format that supports both raster and vector image types. Since PostScript's raster support has no clear advantages over the other image formats we've mentioned, EPS is typically used more for vector images, such as line art. One advantage of EPS files is that they can be trivially integrated into PostScript output. Considering the importance of PostScript to printing on most Unix-based systems, there remains a use for this unusual format.

JPEG is named after the group that created the format specification, the Joint Photographics Expert Group. As is to be expected from the name , the format is meant for encoding of photographic information. In contrast to the formats previously mentioned, JPEG accepts the trade-off of lossy compression for decreased file size. That is, converting an image to JPEG and back again produces a slightly degraded copy of the original image. The degradations will accumulate if the image is converted back and forth multiple times. When encoding a JPEG image, you can also specify the amount of compression you prefer and, consequently, the amount of image loss.

Compression values range from 0 to 100, and typically the range 75 “90 will produce the best trade-off between file size and quality.

JPEG images can achieve compression ratios of 10 or higher over the uncompressed image data. It is interesting to look briefly at the JPEG encoding algorithm to see how this magic is performed. The encoding steps are typical of the types of algorithms used in most lossy image formats, and the high-level concepts are actually quite similar to audio encoding algorithms.

The philosophy behind the JPEG encoding scheme is to preserve the parts of an image that are most important to the human eye and accept inaccuracies in the parts that are least important. In fact, this is the same philosophy behind the wildly popular MP3 format, which achieves similar compression ratios by accepting inaccuracies in the unimportant parts of an audio stream. In both case, the relative importance of various image or audio features was determined by actual psychological studies, and this information went into the design of the encoding algorithms. In the case of images, the JPEG capitalized on two major facts:

· The human eye is less sensitive to changes in hue than to changes in brightness.

· Most photographs have swaths of smoothly varying color and fuzzy edges, rather than swaths of identical colors separated by sharp edges.

Given these requirements, the encoding algorithm works as follows .

Step 1: Color Space Conversion. Most raster images are specified in RGB, that is, each pixel has a particular amount of red, green, and blue light. The JPEG algorithm "rotates" this color space into one grayscale and two colored components, where the gray component ranges from 0 to 1 and the colored components can range from -1 to +1. Negative values for a component can be thought of as "anti"-color; for example, red's "anti"-color would be cyan, and blue's "anti"-color would be yellow. The advantage of changing the color space in this way is that the varying values of the colored components can be encoded with less precision than the gray component, thus taking into account the first of the preceding bullet points. JPEG is built with support for several different color systems.

Step 2: DCT. For each of the three color components, the image is broken up into 8 x 8-pixel blocks. Each of these 64-pixel blocks then undergoes a mathematical operation known as a two-dimensional DCT (discrete cosine transform). The transform converts the square of pixels into an equal- sized square of coefficients that, when the DCT is reversed , can exactly reproduce the original pixels. But if the squares are the same size, each containing 64 numbers , it seems like nothing has been gained , right? Not so! First, when there is a smoothly varying swath of color across the pixel block, the higher-order DCT coefficients are very small compared to the lower-order ones. In fact, some of these higher-order terms can be dropped completely, and the reversed transform will still appear very close to the original image. Thus, for parts of the image without much detail, which are considerable based on the second assumption listed earlier, the algorithm can get away with using less than 64 coefficients ”perhaps as few as 32 “40.

Step 3: Quantization. Another property of the DCT is that the coefficients do not have to be specified with the same accuracy as the original pixels. Hence, each of the remaining DCT coefficients can be reduced in accuracy by using only the high-order bits and dropping the low-order bits. This process of quantization does not have to be the same throughout the entire image or even within the same block. In fact, the colored components are quantized more aggressively than the grayscale component.

Step 4: Compression. Each of the sets of coefficients from the previous steps are then organized and compressed. The most commonly used compression algorithms are Huffman encoding, arithmetic encoding, and entropy encoding. Consult an outside resource for details about the differences between these algorithms. Suffice it to say that this step reduces the data size considerably, in some cases to a few bytes overall for each 8 x 8 block of pixels.

Step 5: Encapsulation. The image header information, various flags, the compression tables, and the compressed data are all tagged and placed into a single file. Other types of tagged elements, such as comments, copyrights, and color correction tables, can be added to the file at this step. In this respect, the outer appearance of a JPEG file is similar to that of a TIFF file. However, a JPEG decoder must reverse the steps of the algorithm to obtain the image data, which is not a trivial task!

The Future: MNG and JPEG2000

The collective experience gained from the development of the World Wide Web has suggested which types of image formats are most practical and improvements that can be made on these image formats. As we look toward the future, there appear to be some promising file formats on the horizon. None of these formats has yet been implemented in a mainstream product, but all have been successfully prototyped, and the theory behind each encoding has been proven.

MNG (Multiple-image Network Graphics), designed by many of the same developers behind PNG, is intended as a replacement and extension of animated GIFs. MNG supports multiple frame types, has excellent compression, and handles complex animations, although they are not as detailed as Macromedia's Flash format. Like PNG, MNG is an openly specified, patent-free format with free library source code available.

JPEG2000 is an exciting new image format from the Joint Photographics Experts Group, intended as a successor to the popular JPEG format. The original JPEG format achieves lossy compression with a discrete cosine transform, and the mathematical theory behind DCT (Fourier transforms) has been known for hundreds of years . JPEG2000 uses the related but much more recent discovery of wavelet transforms to simultaneously achieve higher compression ratios and better image quality than the original JPEG format. JPEG2000 is also capable of compressing different parts of an image differently, based on the amount of detail in that area; it is not limited to 8 x 8 blocks and a global compression-quality parameter. Currently very few applications support JPEG2000 images, although it is the author's personal hope that this will change.

Which Image Format Is Right?

The answer to this question really depends on the qualities of the image you are working with. A few simple rules of thumb will suffice for 90% of cases.

· If you want to preserve images precisely but don't really care about displaying them in a Web browser or about having extensive application support, you can use Windows BMP format, TIFF, or PNG. Keep in mind that there are sometimes interoperability problems between different TIFF implementations ; some software will insert tags that aren't recognized by other applications, which can confuse them to the point of being unable to open the image.

· If the image is a photograph and you can tolerate a little bit of loss, you can't beat the JPEG format. PNG may also be acceptable if the image is small or needs high fidelity.

· If the image is an icon or is clipart with a limited number of colors, use PNG or GIF, depending on your preferences and application support. Remember that both of the major Web browsers currently support PNG and that PNG has some advantages over GIF.

· If the image is animated, you will probably have to use animated GIFs for the moment, until there is better application support for MNG.

· If you need complex animations or line art, you may want to investigate the Flash format from Macromedia. Although the Flash player is free for most platforms as a browser plug-in, Flash is a proprietary format, and you must purchase Flash creation software.

If all else fails or you're not really sure what to do, you can always try the experimental approach. Save the image in more than one format and see which has the best combination of quality and file size. When I design Web pages, I routinely save almost every image as a GIF, PNG, and JPEG with at least two different quality settings (unless I know from experience which will work best). Then, of the ones with acceptable image quality, I pick the one with the smallest file size. The result is professional-looking images with very fast load time.