Content Compression

Most browsers released since 19981999 support the HTTP 1.1 standard known as IETF "content-encoding" (although content encoding was included in the HTTP 1.0 specification: RFC 1945. [3] ) Content encoding (also known as HTTP compression ) is a publicly defined way to compress (that is, deflate) HTTP content transferred from web servers to browsers using public domain compression algorithms, like gzip.

[3] Tim Berners-Lee, Roy T. Fielding, and Henrik F. Nielsen, "Hypertext Transfer ProtocolHTTP/1.0," RFC 1945 [online], (1996), available from the Internet at http://www.ietf.org/rfc/rfc1945.txt. This RFC includes a content encoding section.

Here's how it works

Browsers tell servers they would prefer to receive encoded content with a message in the HTTP header, like this:

 Accept-Encoding: gzip 

The server should then deliver the content of the requested document using an encoding accepted by this client. If the client isn't lying (like early versions of Netscape 4.x can), the compressed data is decompressed by the browser. Modern browsers that support HTTP 1.1 content-encoding support ZLIB inflation of deflated documents and benefit from HTTP compression. Older browsers that don't send the Accept-Encoding header automatically receive the uncompressed version of your files.

You can create gzipped versions of your files beforehand and let server add-ons like mod_gzip or mod_deflate do the negotiation for you (that is, .html.gz or .html , and .js.gz or .js ), or let software like mod_gzip compress your data on the fly (letting Apache do the negotiation alone is possible but problematic ).

The net effect is dramatically smaller files, faster page response, and lower bandwidth bills. In fact, webmasters who have employed content encoding on their servers have realized bandwidth savings of 30 to 50 percent. The compression ratio depends on the degree of redundancy in your site's content, and the ratio of text to multimedia. Most importantly, compressed pages display much faster as the browser downloads less data. As browsers decompress compressed content with ZLIB, a dictionary-based algorithm, decompression speed is very fast.

GZIP Compression

The easiest way to employ content encoding on your server is to use software specifically designed for this purpose. Mod_gzip, mod_hs, and mod_deflate-ru are software modules that automate the entire process for Apache, as does PipeBoost and Hyperspace i for Microsoft's IIS server. These server add-ons can work with static or dynamic content, and the predefined installation files take care of most common server and browser configurations. Later in this chapter, we'll compare the strengths and weaknesses of these and other compression modules and filters available for the Apache and IIS servers.

How good is gzip compression? Some say that the improvement can be up to 90 percent; others are happy with 25 percent. They just measure it in a different way. HTML files are typically compressed by 80 percent (5:1 ratio), while JavaScript and CSS files average 70 and 80 percent compression, respectively.

A typical HTML page consists of an HTML file, several image files, sometimes a CSS file, and a couple of JavaScript files. Images are already compressed, so it doesn't make sense to apply content compression to them. Because images and multimedia objects take more than half of total web page size , [4] even if you compress the HTML file by 90 percent, the total compression ratio will be less than 50 percent.

[4] Eric Siegel of Keynote Systems, email to author, 25 September 2002. The median KB40 site has over 50 percent graphics, and without JavaScript, over 60 percent. See also a previous study at http://www.keynote.com/solutions/assets/applets/Performance_Analysis_of_40_e-Business_Web_Sites.pdf.

To give you an idea of how effective compression can be, take a look a Table 18.2. This table shows the potential compression ratios for five of the most popular high-tech companies, five online newspapers, five web directories, and five sports resources.

Table 18.2. Content Encoding: Average Compression Ratios for Different Web Site Categories
Web Site Type (Average for 5 Web Sites) Number of Files HTML, CSS, and JS Files Only All Files Including Graphics
Original Size Compressed Size Savings Original Size Compressed Size Savings
High-Tech Company 14 26,531 5,092 79% 60,650 39,211 35%
Newspaper 37 74,688 16,218 79% 15,0220 91,749 40%
Web Directory 11 36,096 13,296 69% 50,168 27,368 46%
Sports 24 41,011 10,167 74% 11,0530 79,686 27%
Average 22 44,582 11,193 75% 92,892 59,504 37%

High-tech: www.cisco.com , www.hp.com, www.ibm.com , www.microsoft.com , www.oracle.com

Newspapers: www.latimes.com , www.nytimes.com , www.usatoday.com , www.washingtonpost.com , www.wsj.com

Web directories: www. altavista .com , www.looksmart.com , www.lycos.com , www.netscape.com , www.yahoo.com

Sports: www.espn.com , sports.yahoo.com , sportsillustrated.cnn.com , www.sportsnetwork.com , www.usatoday.com/sports/front.htm

On average, the text portion of these sites was compressed by 75 percent. Overall, compression would save 37 percent in total file size.

Modem Compression Is Not Enough

Most dial-up modems use V42bis or V44bis compression based on the LZW algorithm. If modems already compress data, you might ask, why do we need any additional compression? First, because of the speed and limited memory of modems, modem compression has relatively low compression ratios. In my experience, V42b provides a compression ratio of about 2:1 on most text files. As you learned earlier, gzip compression gives much higher compression ratios of 3:1 or higher. Second, modems do not compress SSL-encrypted files. Most importantly, dial-up connections are only one way to connect to the Internet. DSL, cable, and T1 modems as well as network cards do not have compression onboard. That is why HTTP content compression is so important.

 



Speed Up Your Site[c] Web Site Optimization
Speed Up Your Site[c] Web Site Optimization
ISBN: 596515081
EAN: N/A
Year: 2005
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net