Compression shrinks the amount of data you have to send from your web server. Browsers decompress this data on the fly, which increases page display speed, despite the additional work necessary for the browser. As external objects make up over 50 percent of the average web page, developers have naturally focused their efforts on compressing images and multimedia to reduce file size .
HTML compression is often overlooked, however. Since 19981999, most browsers have been equipped to support the HTTP 1.1 standard known as IETF "content- encoding." Content encoding is a publicly defined way to compress (that is, deflate) HTTP content transferred from web servers to browsers using public domain compression algorithms, like gzip . That content of course includes HTML.
HTML is pure ASCII text, which is highly compressible. HTML markup is also highly redundant, with a small set of tags that are used repeatedly. This redundancy makes HTML even more compressible , especially for string replacement algorithms like gzip .
To give you an idea of the savings you can expect from HTML compression, let's take a real-world example. You can create gzipped versions of static files beforehand, and let tools like mod_gzip or PipeBoost do the negotiation for you. Or for more dynamic data, you can let these tools compress your data on the fly and do the content negotiation. (Letting the Apache or IIS servers do the negotiation on their own is possible but problematic .) For example, the home page of PopularMechanics.com can be compressed from 138,548 to 21K using gzip compression (see Table 4.1).
|PopularMechanics.com ( HTML)||GZIP -9||Percentage Savings|
GZIP compression typically saves from 80 to 85 percent off HTML files. We continue optimizing PM's home page in Chapter 6, "Case Study: PopularMechanics.com."
Although static files can be compressed with gzip 's maximum setting of 9, mod_gzip actually uses the more moderate compression setting of 6, which gives a good compromise between file size and decompression speed.
The net effect is dramatically smaller files, faster page response, and lower bandwidth bills. In fact, webmasters who have employed content-encoding on their servers have seen bandwidth savings of 30 to 50 percent. The compression ratio depends on the degree of redundancy in your site's content, and the ratio of text to multimedia. Because HTML text files can be highly redundant (especially tables), compression rates for HTML files can be dramatic, with savings up to 90 percent. Because compression of HTML files is almost universally supported, there's really no reason not to support content encoding for HTML files on your site. For more details on content encoding, see Chapter 18, "Compressing the Web."