Content Compression: Client Side | Speed Up Your Site[c] Web Site Optimization

As you learned in the "Content Compression" section, browsers and servers have brief conversations about what kind of content they would prefer to accept and deliver. The browser tells the server that it can accept content encoding, and if the server is capable, it will then compress the data and transmit it. The browser decompresses the data and then renders the page. Clients that don't understand compressed content don't request encoded files and, thus, receive files uncompressed ( assuming that the content is offered conditionally). By definition, HTTP 1.1-compliant browsers support gzip compression. Most modern browsers support gzip content encoding (see Table 18.3).

Table 18.3. Browser Content Encoding Support
Browser	Encoding Support
Microsoft Internet Explorer	4.x+ gzip, deflate. Macintosh versions do not understand coding by the methods gzip and deflate. They do not transfer the "Accept-Encoding" header.
	There is a caching issue with compressed content in Internet Explorer. Fortunately, all the content compression software vendors are aware of this and know how to work around it. The only software that works incorrectly with MSIE is Microsoft Internet Information Server.
Netscape 4.06+	Supports HTTP/1.0, but Netscape 4.06 and later versions send "Accept-Encoding: gzip" in the header. There are some limitations, however. It works consistently only for content type "text/html" or "text/plain." JavaScript and CSS files ("application/x-javascript" and "text/css") will not be decompressed properly.
Mozilla m14-m18, 0.6-0.9.3, Netscape 6.0-6.1, Galeon, and SkipStone	Error in implementation.
Mozilla 0.9.4+, Netscape 6.2+	Good
Opera 5.12+	Good
Lynx 2.6+	Good
Konqueror	gzip only

What is the difference between gzip and deflate? Both are based on the same compression algorithm, deflate, ^[5] implemented in the compression library zlib. ^[6] Deflate encoding assumes that you are sending only the compressed data. gzip ^[7] adds a 10-byte header to the compressed data. It also adds a CRC32 checksum and the length of the compressed data (4+4=8 bytes) to the end of compressed file. The image of transferred data is a valid .gz file.

^[5] L. Peter Deutsch, "DEFLATE Compressed Data Format Specification Version 1.3," RFC 1951 [online], (Alladin Enterprises, 1996), available from the Internet at http://www.ietf.org/rfc/rfc1951.txt.

^[6] L. Peter Deutsch and Jean-Loup Gailly, "ZLIB Compressed Data Format Specification version 3.3," RFC 1950 [online], (Alladin Enterprises, 1996), available from the Internet at http://www.ietf.org/rfc/rfc1950.txt.

^[7] L. Peter Deutsch, "GZIP File Format Specification version 4.3," RFC 1952 [online], (Alladin Enterprises, 1996), available from the Internet at http://www.ietf.org/rfc/rfc1952.txt.

There is one more content-encoding compression algorithm: compress, which utilizes a compression algorithm implemented in the UNIX compress utility. This algorithm is supported only by Lynx, Netscape, and Mozilla.

Because some versions of Konqueror have an error in deflate decoding and gzip is widely supported, most compression solutions use gzip content encoding.

HTML and Compression

HTML and other text files can be compressed on the server and automatically decompressed with HTTP 1.1-compliant browsers. Because HTML files must be downloaded before your content appears, fast delivery of this page framework is critical to user satisfaction.

Because HTML text files can be highly redundant ( especially tables), compression rates for HTML files can be dramatic, with savings up to 90 percent. Most modern browsers support decompression of HTML files compressed with gzip.

The ZLIB Saga

After CompuServe and Unisys rattled their GIF copyright sabers in late 1995, browser manufacturers rushed to add PNG support to their browsers. ^[8] Luckily, the PNG format uses public domain GZIP/ZLIB ^[6] compression algorithms (deflate and inflate), which are based on the older, non-proprietary Lempel-Ziv algorithm (LZ77). ^[9] GIFs use the less efficient Lempel-Ziv-Welch algorithm (LZW), ^[10] which is based on LZ78. ^[11] So in order to receive and display PNG files, the browser manufacturers had to add ZLIB inflation to their browsers. CompuServe subsequently backed down, but the deed was done. Now browsers had ZLIB support.

Developers at Microsoft and Netscape realized that they already had ZLIB on board to handle inflating PNG files. Why not implement IETF content encoding? Why not indeed.

Their first attempts went badly (browsers would report "Accept-Encoding" but then botch things when the compressed data arrived), but after a few more browser releases, they both got it right. The outcome is that any browser that can display PNG files can usually decompress anything sent with IETF content encoding: gzip.

^[8] Michael C. Battilana, "The GIF Controversy: A Software Developer's Perspective" [online], (Las Vegas: Cloanto Italia, 1995 [cited 12 November 2002]), available from the Internet at http://www.lzw. info .

^[9] Jacob Ziv and Abraham Lempel, "A Universal Algorithm for Sequential Data Compression," IEEE Transactions on Information Theory 23, no. 3 (1977): 337343. LZ77 described.

^[10] Terry A. Welch, "A Technique for High-Performance Data Compression," IEEE Computer 17, no. 6 (1984): 819. The LZW algorithm described.

^[11] Jacob Ziv and Abraham Lempel, "Compression of Individual Sequences via Variable Rate Coding," IEEE Transactions on Information Theory 24, no. 5 (1978): 530536. LZ78 described.

CSS Compression

In theory, you can also compress external style sheets using content encoding. In practice, webmasters have found that browsers inconsistently decompress .css files. Apparently, style sheets were hacked into some browsers in a non-HTTP-compliant way. So when these browsers receive a 'Content-Encoding: gzip' header in the response for a .css file, they don't realize that they are supposed to decompress it first.

This is not always the case, however, and no one to my knowledge has been able to nail down which browsers can actually handle the decompression of style sheets and under what circumstances. The problem seems to involve a mixture of variables . Therefore, I recommend that you exclude compression of .css files in any configuration files for programs such as mod_gzip:

 mod_gzip_item_exclude         file       \.css$

Most .css files are smaller than .js files anyway, so the need for compression is usually greater for .js files. In fact, CSS files are usually so small that the two HTTP headers needed to request and respond can add up to a significant portion of the total traffic (up to 7501,000 bytes). So for smaller CSS files on high-traffic pages, it may be more efficient to embed them directly into your (X)HTML files or use SSI, where they can then be compressed.

JavaScript Compression

Like HTML files, external JavaScript files can be compressed with IETF content encoding. Unlike external .css files, support for decompressing compressed JavaScript files is good in modern HTTP 1.1-compliant browsers, as long as they are placed within the head of your HTML documents. Although it is possible, I don't recommend using proprietary compression methods to deliver external JavaScript files ( .jar , CHM/ITS, etc.). The standards-based method described in this chapter requires at most only one additional filenot fourto maintain.

JavaScript Compression Gotcha: Use Your head

External scripts must be referenced in the head element of (X)HTML documents to be reliably decompressed by modern browsers. The story goes like this. Netscape's original specification for JavaScript 1.1 implied that the inclusion of JavaScript source files should take place in the head section, because that is the only place where they are pre-loaded and pre- processed . ^[12] For some reason, browser manufacturers stopped decompressing any compressed files after leaving the head .

^[12] Netscape Communications, "JavaScript Guide for JavaScript 1.1" [online], (Mountain View, CA: Netscape Communications, 1996), available from the Internet at http://wp.netscape.com/eng/mozilla/3.0/handbook/javascript/. Implied in the section "Specifying a File of JavaScript Code" that external files should be in the head section.

As scripts grew larger, developers started moving script elements down into the body to satisfy impatient users. In HTML, the " head -only" rule was then relaxed to allow script elements within the body , but the die was cast. Developers subsequently discovered that certain JavaScript inclusion operations must be in the head section or problems can occur.

Browsers continue to decompress scripts only when they are located within the head element. Some companies get around this limitation by adding " _h " to the names of JavaScript include files in the head section of HTML documents. Using this technique, a script author can use server-side filtering logic to find whether a request for a certain JavaScript file is coming from the head section of a HTML document (where it is OK to send it compressed) versus somewhere in the body (where it is not OK to send the script compressed). You can optionally use the defer attribute to compensate for this requirement.

JavaScript Compression Gotcha: Premature onload Events

Internet Explorer 5 has a known bug when loading compressed JavaScripts. ^[13] IE mistakenly triggers the onload event after it downloads the compressed file, but before it is decompressed. This can lead to unexpected behavior. The way around this bug is to include another variable at the end of your external file and poll for its presence in the onload event handler.

^[13] Kevin Kiley and Andrew Jarman, "Compressing .js files, from a lurkers point of view" [online], (Mod_gzip mailing list, 17 March 2001), available from the Internet at http://lists.over.net/pipermail/mod_gzip/2001-March/001708.html.