Compressing Images | Real World Adobe Photoshop CS2: Industrial-strength Production Techniques

One thing that can be said of all bitmapped images is that they're "pigs" when it comes to hard disk space. In this day and age, when you can buy a 120 GB hard disk as cheaply as a pair of shoes, saving space on disk isn't nearly as important an issue as trying to transfer that data. Whether you have a dial-up connection to the Web or a T1 line in your office, moving massive files around is somewhat painful.

Our aim, then, is to stretch out the scarce resources we have on hand, and keep files that we need to move around reasonably small. And we've got three methods to accomplish this goal: work with smaller images (no, seriously!), archive our images when we're not using them, and work with compressed file formats.

As we keep saying, bitmapped images are made simply of zeros and ones. In an 8-bit grayscale image, each pixel is defined by eight zeros or ones. If images are already reduced to this level of simplicity, how can they be reduced further? By bundling groups of bits together into discrete chunks.

Lossless Compression

Let's take the example of a 1-bit (black-and-white) bitmap, 100 pixels wide and tall. Without any compression, the computer stores the value (zero or one) for each one of the 10,000 pixels in the image. This is like staring into your sock drawer and saying, "I've got one blue sock and one blue sock and one black sock and one black sock," and so on. We can compress our description in half by saying "I've got one blue pair and one black pair."

Run Length Encoding

Similarly, we can group the zeros and ones together by counting up common values in a row (see Figure 13-13). For instance, we could say, "There are 34 zeros, then 3 ones, then 55 zeros," and so on. This is called Run Length Encoding (RLE), and it's automatically used for Macintosh PICT images (fax machines use it, too). We call it "lossless" because there is no loss of data when you compress or decompress the filewhat goes in comes out the same.

Figure 13-13. Run Length Encoding lossless compression

LZW, Huffman, and Zip

There are other forms of lossless compression. For instance, RLE compresses simple images (ones that have large solid-colored areas) down to almost nothing, but it can't compress more complex images (like most grayscale images) very much. LZW (Lempel-Ziv-Welch, though you really don't need to know that) and Huffman encoding work by tokenizing common strings of data.

In plain English, that means that instead of just looking for a string of the same color, these methods look for trends. If RLE sees "010101", it can't do any compression. But LZW and Huffman are smart enough algorithms to spot the pattern of alternating characters, and thereby compress that information. Photoshop can also use the Zip compression when it saves PDF, layered TIFF, and PSD files. Zip is a considerably smarter version of LZW (smarter means it compresses better, but it may take slightly longer to do so).

Lossy Compression

The table of contents at the front of most books is a way of compressing information. If you ripped the table of contents out of this book and mailed it to someone else (we're not actually suggesting that you do this!), they would be able to "unpack" it and read what's in this book. But they wouldn't actually be seeing the words you're reading now. Instead, they'd read an "average" of each chapter. The more detailed chapters have more headings, so your friend would see more detail in them than he or she would in a simple-headed chapter like this one.

Bitmapped images can be similarly outlined (compressed), transmitted to someone else, and unpacked. And similarly, when you look at the unpacked version, you don't get all the detail from the original image. For example, if 9 pixels in a 3-by-3 square are similar, you could replace them all with a single averaged value. That's a nine-to-one compression. But the original data, the variances in those 9 pixels, is lost forever.

This sort of compression is called "lossy" compression because you lose data when compressing it. By losing some information, you can increase the compression immensely. Where a ZIP-compressed TIFF might be 40 percent of the original size, a lossy-compressed file can be 2 percent or less of the original file size.

Levels of JPEG compression

Lossy compression schemes typically give you a choice of how tight you pack the data. (The primary method is JPEG, for Joint Photographic Experts Group.) With low compression, you get larger files and higher quality. High compression yields lower quality and smaller files. How much quality do you lose? It depends on the level of the compression, the resolution of the image, and the content of the image.

Different programs implement JPEG differently, and with varying results. Note that JPEG is both a compression method and a file format in its own right (see "File Formats," earlier in this chapter), but both are based on similar algorithms.

JPEG warnings

Here are a few things to remember when working with JPEG. First, note that images with hard edges, high contrast, and angular areas are most susceptible to artifacts from JPEG compression. For example, a yellow square on a green background in a lower-resolution image looks pretty miserable after lossy compression. Similarly, text (rasterized, not vector) almost always looks terrible after JPEG compression because it has such hard edges.

On the other hand, compressing natural, scanned images using JPEGespecially those that are already somewhat grainy or impressionisticprobably won't hurt them much at all, especially if you use the Maximum or High quality setting.

You should only use JPEG on finished images (those on which you've finished all editing and correction). Tone or color correction on a JPEGed image exaggerates the compression artifacts. Sharpening a JPEGed image produces an effect that might one day find its way into Kai's Power Tools, but it's difficult to envisage a use for it in a production setting.

Also, compressing and decompressing images repeatedly can make images worse than just doing it once. But since we just told you that you should only JPEG finished images, the point is mootyou can just open them, look at them, and close them again.

Fractal and wavelet compression

Two other forms of lossy compressionfractal compression and wavelet compressionoffer much better compression at the cost of more intense calculation. However, other than JPEG 2000which is based on wavelet compressionPhotoshop doesn't support these compression schemes without a third-party plug-in. For example, the Genuine Fractals plug-in, originally developed by Altamira and currently sold by OnOne Software (after a long line of owners, including LizardTech). This has won a loyal following among the large-format print crowd for its ability to upsample files with noticeably less degradation than other methods. OnOne's pxl SmartScale does a similar thingtaking low- to medium-resolution images, compressing them, and then enlarging them with minimum visual degradation.

While these plug-ins' more aggressive compression is lossy, the artifacts that it creates are less objectionable to the eye than those created by JPEG, on the one hand, or overly aggressive upsampling on the other. A lossless compression method is also offered. However, while fractal compression artifacts look natural in images, they look strange on sharp synthetic edges such as type.

Bruce has used Genuine Fractals to upsample 75 MB scans to the 300+ MB required for a 30-by-40-inch print at 300 ppi, and he thinks the result looks more natural than using Photoshop's Bicubic interpolation. But while it can be very useful for this kind of upsampling, the lengthy compression times and relatively low compression ratios of Genuine Fractals make it less appealing as a general compression utility.

To Compress or Not to Compress

Over the years, we've found only a few universal truths. One of those is: "Fast, Cheap, or Good: you can have any two of the three." Compression is certainly no exception to this rule. Compressing files can be a great way to save hard drive space (read: "save money") and sometimes to cut down on printing times (read: "save more money"). But compressing and decompressing files also takes time (read: "lose the money that you just saved").

Optimally, if you have way too much hard drive space and you transfer your files from place to place on DVD discs, you may never need or want to compress your images. Otherwise, you may want to use a lossless compressed file format (Zip-compressed TIFF) for some of your images. If you really need to save space, you might choose a lossy file format (like JPEG).

Bruce uses ZIP-compressed TIFF (with compression on both the layers and the background) as his archival format, except for those relatively rare occasions when he's in too much of a hurry to take the speed hit. David still has restless dreams about fitting files on 400 K disks in 1985, so he tends to use lossless compression on all but the smallest files. When we absolutely need to save a file in a lossy compressed file format, we use JPEG, but we rarely use anything other than Maximum quality in JPEGwe find the increase in compression at Good, Medium, or Low simply isn't worth the degradation in quality (except for Web images). Ultimately, storage space is getting so inexpensive these days that it's almost silly to worry about compression. Writing 650 MB CDs or 5.2 GB DVDs is now so inexpensive that you might as well buy one and another for your dog. If time equals money, then time spent compressing and decompressing files (whether it's manual or automatic) is money down the big porcelain doughnut.

Archiving

You may have worked with programs such as StuffIt or ZipIt. These tools all have the same function: to compress filesany kind of filesand save space on your hard drive. This sort of compression is called archiving because people typically use it on files that they're not currently using.

Archiving a file is like folding up a piece of paper and putting it into an envelope. It takes a little time to fold it up (compress it) and a little time to unfold it (decompress it), and while it's in the envelope, you can't read it. The archive file (the "envelope" that contains the compressed file) takes up less room on your hard drive, but to work on the enclosed file, you have to decompress it. Both Mac OS X 10.3 (and later) and Windows XP have Zip compression built in.

All archival compression programs use lossless compression methods, so you never have to worry about degrading the image. However, that also means they may not compress down as much as you'd like. ZIP-compressed TIFFs, for example, generally won't get any smaller. If we need to send a number of images to someone via the Internet, we generally zip them first, but that's mostly to get them into one file, and to provide a degree of protection from the munging behavior that various Internet gateways are prone to exhibiting.