Most browsers released since 19981999 support the HTTP 1.1 standard known as IETF "content-encoding" (although content encoding was included in the HTTP 1.0 specification: RFC 1945. [3] ) Content encoding (also known as HTTP compression ) is a publicly defined way to compress (that is, deflate) HTTP content transferred from web servers to browsers using public domain compression algorithms, like gzip.
Here's how it works
Browsers tell servers they would prefer to receive encoded content with a message in the HTTP header, like this:
Accept-Encoding: gzip
The server should then deliver the content of the requested document using an encoding accepted by this client. If the client isn't lying (like early versions of Netscape 4.x can), the compressed data is decompressed by the browser. Modern browsers that support HTTP 1.1 content-encoding support ZLIB inflation of deflated documents and benefit from HTTP compression. Older browsers that don't send the Accept-Encoding header automatically receive the uncompressed version of your files.
You can create gzipped versions of your files beforehand and let server add-ons like mod_gzip or mod_deflate do the negotiation for you (that is, .html.gz or .html , and .js.gz or .js ), or let software like mod_gzip compress your data on the fly (letting Apache do the negotiation alone is possible but problematic ).
The net effect is dramatically smaller files, faster page response, and lower bandwidth bills. In fact, webmasters who have employed content encoding on their servers have realized bandwidth savings of 30 to 50 percent. The compression ratio depends on the degree of redundancy in your site's content, and the ratio of text to multimedia. Most importantly, compressed pages display much faster as the browser downloads less data. As browsers decompress compressed content with ZLIB, a dictionary-based algorithm, decompression speed is very fast.
The easiest way to employ content encoding on your server is to use software specifically designed for this purpose. Mod_gzip, mod_hs, and mod_deflate-ru are software modules that automate the entire process for Apache, as does PipeBoost and Hyperspace i for Microsoft's IIS server. These server add-ons can work with static or dynamic content, and the predefined installation files take care of most common server and browser configurations. Later in this chapter, we'll compare the strengths and weaknesses of these and other compression modules and filters available for the Apache and IIS servers.
How good is gzip compression? Some say that the improvement can be up to 90 percent; others are happy with 25 percent. They just measure it in a different way. HTML files are typically compressed by 80 percent (5:1 ratio), while JavaScript and CSS files average 70 and 80 percent compression, respectively.
A typical HTML page consists of an HTML file, several image files, sometimes a CSS file, and a couple of JavaScript files. Images are already compressed, so it doesn't make sense to apply content compression to them. Because images and multimedia objects take more than half of total web page size , [4] even if you compress the HTML file by 90 percent, the total compression ratio will be less than 50 percent.
To give you an idea of how effective compression can be, take a look a Table 18.2. This table shows the potential compression ratios for five of the most popular high-tech companies, five online newspapers, five web directories, and five sports resources.
Web Site Type (Average for 5 Web Sites) | Number of Files | HTML, CSS, and JS Files Only | All Files Including Graphics | ||||
Original Size | Compressed Size | Savings | Original Size | Compressed Size | Savings | ||
High-Tech Company | 14 | 26,531 | 5,092 | 79% | 60,650 | 39,211 | 35% |
Newspaper | 37 | 74,688 | 16,218 | 79% | 15,0220 | 91,749 | 40% |
Web Directory | 11 | 36,096 | 13,296 | 69% | 50,168 | 27,368 | 46% |
Sports | 24 | 41,011 | 10,167 | 74% | 11,0530 | 79,686 | 27% |
Average | 22 | 44,582 | 11,193 | 75% | 92,892 | 59,504 | 37% |
High-tech: www.cisco.com , www.hp.com, www.ibm.com , www.microsoft.com , www.oracle.com | |||||||
Newspapers: www.latimes.com , www.nytimes.com , www.usatoday.com , www.washingtonpost.com , www.wsj.com | |||||||
Web directories: www. altavista .com , www.looksmart.com , www.lycos.com , www.netscape.com , www.yahoo.com | |||||||
Sports: www.espn.com , sports.yahoo.com , sportsillustrated.cnn.com , www.sportsnetwork.com , www.usatoday.com/sports/front.htm |
On average, the text portion of these sites was compressed by 75 percent. Overall, compression would save 37 percent in total file size.
Most dial-up modems use V42bis or V44bis compression based on the LZW algorithm. If modems already compress data, you might ask, why do we need any additional compression? First, because of the speed and limited memory of modems, modem compression has relatively low compression ratios. In my experience, V42b provides a compression ratio of about 2:1 on most text files. As you learned earlier, gzip compression gives much higher compression ratios of 3:1 or higher. Second, modems do not compress SSL-encrypted files. Most importantly, dial-up connections are only one way to connect to the Internet. DSL, cable, and T1 modems as well as network cards do not have compression onboard. That is why HTTP content compression is so important.