Content Compression: Server Side

So it sounds like all you need to do is compress your content on the server and deliver it, right? Unfortunately, it is not that simple. You have three options for choosing the type of compression you want to use:

  • Static pre-compressed content

  • Dynamic real-time server-side compression

  • Proxy-based compression

Static pre-compressed content makes sense for heavily loaded web sites with more or less static content. Some news web sites keep all their news in a database and convert it to static HTML files once every several minutes. These sites send responses faster than sites with ASP, JSP, or PHP pages. The best thing to do here is to keep a compressed copy of the pages ready to send. If an HTTP request contains the Accept-Encoding line, send a compressed file; otherwise , send an HTML file.

Dynamic real-time server-side compression compresses files before sending them to the client. It can be implemented as a plug-in to a web server or as a part of standalone proxy server.

Proxy-based compression compresses files between the server and the client, alleviating the need to compress files on the server. This reduces server load and gives ISPs and enterprise operations more power and flexibility.

Content Compression in Apache

In Apache, you can pre-compress content or install a module to compress content on the fly.

There is a special administration tool written in Perl to pre-compress content on your Apache web site and properly set up the .htaccess file. It is available at http://www.chatologica.com/site/eWSA.htm.

You can pre-compress content manually as well. The process is pretty straightforward:

  1. gzip your .htm files to .htmz .

  2. Add to httpd.conf following line:

     AddEncoding gzip htmz 
  3. Add index.htmz as a default directory index:

     DirectoryIndex index.htmz index.htm index.html index.html.var 
  4. In the mime.types file, add htmz to the text/html line:

     text/html              html htm htmz 

But how do you address links? Links in the .htmz files point to .htm ; so the index file will be compressed, and the other files will not. You can copy the web site, change links to .htmz , and then compress. This will work, but it is a lot of work. There is a better way that is based on Apache content negotiation and the Multiviews feature.

Since version 1.2, the Apache web server has supported content negotiation as defined in the HTTP 1.1 specification. Apache 1.2 supports server-driven content negotiation, while Apache 1.3.4 supports transparent content negotiation. In order to negotiate a resource, the server needs to know about the variants of each resource.

Multiviews implicitly maps variants based on filename extensions, like .gz.Multiviews is a per-directory option that can be set within .htaccess files or within the httpd.conf file for one or more directories. Setting the Multiviews option within the .conf file is more efficient because the server doesn't have to access an .htaccess file every time it accesses a directory. Here's how you turn on Multiviews in the httpd.conf file:

 Options   Multiviews 

In a real configuration file, it will look like this:

 <Directory />      Options FollowSymLinks Multiviews     AllowOverride None </Directory> 

Apache recognizes only encodings that are defined by an AddEncoding directive. So to let Apache know about gzip encoded files, you'd add the following directive:

 AddEncoding gzip .gz 

With Multiviews set, authors need only create filename variants of resources, and Apache does the rest. So to create gzipped compressed versions of your .html or .js files, you zip them up like this:

 gzip 9 index.html  gzip 9 script.js 

Then you link to the uncompressed .html or .js files, and Apache would negotiate to the .gz variant for capable browsers.

Some Browsers Lie

Apache (through 2.0x) checks to see whether browsers have sent the Accept-Encoding: header and assumes that any pre-compressed files are OK to send. This isn't always the case, however, because some browsers lie. Netscape 4.x sends the following message, yet it is incapable of receiving compressed data for some file types:

 Accept-Encoding: deflate, gzip 
For More Information

To learn more about content negotiation and how to combine content encoding with other features like content language, see this document: http://httpd.apache.org/docs-2.0/ content-negotiation .html.

Content negotiation produces significant overhead, slowing server response by as much as 25 percent. But as long as the Apache reply time is just several milliseconds , there is no way your users will notice the response time when requesting pages with their browsers.

Apache and Microsoft's IIS server can perform content negotiation, but to avoid any headaches due to misleading browsers, it is best to use a tool designed specifically for the job, like mod_gunzip or mod_gzip.

Serving Static Pre-Compressed Files with mod_gunzip

As long as only a tiny percent of requests come from older browsers, it may make sense to save only compressed versions of your content. If an old browser requests a page, your server can decompress it in real time before sending.

For More Information

A mod_gunzip module is available for the Apache web server. It is available at http://www.oldach.net/. You can find configuration information for mod_gunzip at http://www.innerjoin.org/apache-compression/howto.html.

To use mod_gunzip, create a second gzipped version of your HTML or JavaScript ( .html.gz or .js.gz ) file, like this:

 gzip 9 index.html  gzip 9 script.js 

Then link to the uncompressed file as normal, using the href or src attribute:

 <a href="/index.html">Home</a>  <script src="/script.js" type="text/javascript"></script> 

Now when a browser reports that it can understand IETF content encoding, the server can deliver the compressed version of your text file.

Serving Static Pre-Compressed Files with mod_gzip

Mod_gzip also can be used to deliver pre-compressed files to appropriate browsers. By turning on the mod_gzip_can_negotiate switch, you can deliver compressed files to the right browsers. You can run mod_gzip in "negotiation-only" mode by also omitting any include rules. These rules tell mod_gzip which files to compress dynamically. Here's a snippet from mod_gzip's Apache 1.3x configuration file:

 # use mod_gzip at all?      mod_gzip_on                   Yes # --------------------------------------------------------------------- # let mod_gzip perform 'partial content negotiation'?     mod_gzip_can_negotiate        Yes 

Netscape 4.x in particular has spotty support for compressed JavaScript. You can use two general approaches to deal with Netscape 4:

  • Include HTTP 1.0 browsers (like Netscape 4.x) that support content encoding and exclude .js files (and optionally SSI .js files so they'll be compressed with the HTML), or

  • Require HTTP 1.1-compliant browsers to exclude Netscape 4.x and deliver compressed .html and .js files.

You can specify these two scenarios easily in the filters section of the mod_gzip configuration file. Here's how you'd set mod_gzip for the first scenario:

 ###############  ### filters ### ############### # --------------------------------------------------------------------- # Required HTTP version of the client # Possible values: 1000 = HTTP/1.0  1001 = HTTP/1.1, ... # This directive uses the same numeric protocol values as Apache internally     mod_gzip_min_http             1000 # --------------------------------------------------------------------- # which files are to be compressed? # # phase 1: (reqheader, uri, file, handler) # ======================================== # NO:   include files / JavaScript & CSS (due to Netscape4 bugs)     mod_gzip_item_exclude         file       \.js$     mod_gzip_item_exclude         file       \.css$ # # phase 2: (mime, respheader) # =========================== # YES:  normal HTML files, normal text files, Apache directory listings     mod_gzip_item_include         file       \.html$     mod_gzip_item_include         mime       ^text/html$     mod_gzip_item_include         mime       ^text/plain$     mod_gzip_item_include         mime       ^httpd/unix-directory$ ... 

To exclude Netscape 4.x and include .js files for compression for the second scenario, bump up the minimum HTTP level and include .js files, like this:

 mod_gzip_min_http             1001  mod_gzip_item_include         file       \.js$ mod_gzip_item_exclude         file       \.css$ 
For More Information

For more details on configuring mod_gzip, consult the documentation available at the following sites:

  • http://www.ehyperspace.com/

  • http:// sourceforge .net/projects/mod-gzip/

You also can find more details at the following sites:

  • Mod_gzip mailing list: http://lists.over.net/mailman/listinfo/mod_gzip/

  • Michael Schrpl's mod_gzip site: http://www.schroepl.net/projekte/mod_gzip/

Pre-compressed content has some advantages. By pre-compressing your files, you can save CPU resources on your server. Dynamic compression of data requires some CPU horsepower. For higher loads on slower servers, dynamic compression can be too CPU intensive , so pre-compression makes more sense. On faster servers with dynamic content, however, dynamic compression is a better choice.

Pre-compressed content also has some disadvantages: It requires more maintenance and cannot deliver dynamic content.

How to Compile and Install an Apache Module

Most of the custom Apache modules including mod_gunzip are provided in source code form. If you want to get the maximum performance from your Apache server, be prepared to compile them. However, to save time I recommend that you download a pre-compiled module instead of source code when one is available.

You can attach a module to Apache in two ways: by static or dynamic linking. Statically linking to a module means that every time you want to add, remove, or update the module, you have to rebuild Apache. Dynamic linking, although it is about five percent slower, does not require recompiling. You can easily update or unplug dynamically linked modules, although you must recompile Apache to unplug static modules. Static linking is better for security reasons because it is much harder to replace part of the code, which is important for commerce sites (SSL). For flexibility, consider dynamic linking. For maximum performance and security, consider static linking.

Here are the steps for static linking:

  1. Add the following line near the bottom of the src/Configuration file:

     AddModule modules/mod_gunzip/mod_gunzip.c 
  2. Put the module source code here:

     modules/mod_gunzip/mod_gunzip.c 
  3. Rebuild the Apache server.

For a dynamic link, you only need to build the module. Here are the steps:

  1. Decompress the source with tar if it is compressed:

     tar -zxvf mod_gunzip.tar.gz 
  2. Compile the module:

     /usr/local/apache/bin/apxs -i -a -c -lz mod_gunzip.c 

    Option -c compiles, -i installs , and -a activates. Activating means adding the proper lines to the Apache config file and restarting.

    The apxs tool is described here:

    http://httpd.apache.org/docs/programs/apxs.html.

  3. Place the resulting file mod_gunzip in the /usr/httpd/modules directory (or the /usr/lib/apache/ directory, depending on your version or distribution). For Windows, it will be something like this:

     C:\Program Files\Apache Group\Apache2\modules\. 
  4. Add the following line into the httpd.conf file:

     LoadModule gunzip_module /usr/httpd/modules/mod_gunzip.so 
For More Information

More information about compiling Apache modules can be found at http://httpd.apache.org/docs-2.0/dso.html.

Dynamic Compression

If you need to generate every page on the fly, there is no way you can statically pre-compress it. Pre-compressed content wouldn't work well for a site like Google, for example.

Fortunately there are a number of real-time Apache compression modules. Notice, however, that none are perfect (see Table 18.4).

Table 18.4. Apache Modules for Dynamic Content Compression
Module Advantages Disadvantages
Mod_gzip from RemoteCommunications, Inc. (now an open sourceSourceForge project) Lots of documentation. Lots of parameters for fine-tuning. Free. Save compressed content to temporary fileworks slowerthan other modules.
Mod_gzip from VIGOS AG The first product on the market, includes whitespace removal, improved gzipalgorithm (30 percent fasterthan RCI's mod_gzip), only in-memory compression, andbrowser auto-detection. Only available for Apache 1.3. A commercial product. Same name as SourceForge mod_gzip. VIGOS encourages using their reverse proxy Website Accelerator instead.
Mod_deflate ru by Khrustalev and Sysoev Lots of parameters for fine-tuning. Fast and efficient. Documentation is poorly translated to English. Name violation with Apache mod_deflate. Distributed in source only. No Windows version available. Incompatiblewith Apache 2.
Mod_deflate from the Apache Software Foundation Included in Apache 2.0. No compilation required. Not flexible.
Mod_hs from HyperSpace Communications, Inc. Improved commercial version of mod_gzip. Fast and efficient. Incompatible with Apache 2.
Gzip_cnc by Michael Schrpl Can be installed on a shared hosting with restricted access to Apache settings. Written in Perl; does not handle dynamic content.

There are two different mod_deflate modules. Which module is the real one? Igor Sysoev claims he created his module in April 2001, while Apache created its module in August 2001. In any event, make sure that you use the right documentation for the appropriate module.

These modules all use essentially the same gzip algorithm, so the compression ratio is the same. I recommend that you use the maximum compression level of 9 instead of the default of 6 when possible. It may save you 100 bytes on a 100KB HTML file. The modules vary in how fast they compress and deliver content, and how specifically you can target browsers and file types.

What is the speed difference for compressed versus uncompressed content? It is hard to believe, but Apache responds with the compression module only about two times slower than Apache configured to send pre-compressed content.

Dynamic compression of content does have some advantages. In full content negotiation mode, mod_gzip and similar modules handle everything: They negotiate the compression of content on the fly and then deliver the data. Maintenance costs are reduced because separate compressed files are not necessary. More importantly, you can deliver compressed dynamic content, which is not possible with pre-compression.

Dynamic compression also has some disadvantages. It requires some CPU power to compress files on the fly. However, mod_gzip and others deflate very efficiently with their compiled C code. CPU concerns are a factor only on slower servers.

Mod_deflate, Apache Version

Mod_deflate from Apache is the easiest solution for dynamic content compression because you don't need to find or compile itit's already there in Apache 2.0. Just add the following lines to httpd.conf :

 LoadModule deflate_module modules/mod_deflate.so  SetEnv gzip-only-text/html 1 SetOutputFilter DEFLATE 

And enjoy the compression!

All the HTML files requested with appropriate Accept-Encoding headers will be compressed from now on.

Unfortunately, there is no way you can fine-tune the Apache mod_deflate module to configure which file types are compressed and which browsers are targeted .

mod_deflate is included in the Apache 2.0 distribution at http://httpd.apache.org/docs-2.0/mod/mod_deflate.html.

Mod_gzipConfigurable Compression

If you really need fine-tuning and advanced statistics, use mod_gzip instead. Mod_gzip is currently an open source SourceForge project at http://sourceforge.net/projects/mod-gzip/.

Mod_gzip has one minor disadvantage : It temporarily saves the compressed versions of files to disk before serving them, which can slow the response by up to 30 percent. On the other hand, it is the most extensively tested module with proven reliability.

To install mod_gzip, see the section, "How to Compile and Install an Apache Module," and substitute mod_gzip filenames for mod_gunzip. Note that recent versions of mod_gzip at SourceForge are no longer in a single-file distribution.

You can make the entire compression process transparent by letting mod_gzip do the compression on the fly. Without pre-compressed variants, you can turn off negotiation to save one file access for each request, as shown here:

 # let mod_gzip perform 'partial content negotiation'?      mod_gzip_can_negotiate        No 

Next, set mod_gzip's include rules to specify which types of files to compress:

 mod_gzip_item_include         file       \.js$  mod_gzip_item_exclude         file       \.css$ mod_gzip_item_include         file       \.html$ ... 

Once installed, mod_gzip will compress the data that you specify automatically. Browsers that can handle compressed data will receive it, and browsers that don't, won't.

Although static files can be compressed with gzip's maximum setting of 9, mod_gzip actually uses the more moderate compression setting of 6, which gives a good compromise between file size and decompression speed.

Mod_hsThe Commercial Version of mod_gzip

mod_hs is a commercial product based on mod_gzip that was created by HyperSpace, Communications, Inc. HyperSpace claims a 30 percent performance increase achieved by in-memory compression and elimination of disk I/O operations. For more information, see http://www.ehyperspace.com.

Mod_deflate ru

If you need fine-tuning and the best possible performance, try mod_deflate from sysoev.ru . It is more flexible than mod_gzip and allows you to add the compression method (deflate or gzip), uncompressed file size, compressed file size, and compression ratio to the Apache log file.

According to the authors, the module's installation fixes some bugs in the Apache source code, so you'll need to re-build Apache after you configure the module.

To install mod_deflate:

  1. Unpack it with tar and run the configuration script:

     tar zxf mode_deflate-mod_deflate-1.0.12.tar.gz  cd mod_deflate-mod_deflate-1.0.12 ./configure --with-apache=<  apache_dir  >   -- with-zlib=<zlib_dir> make 

    ZLIB is available for free download here: http://www.gzip.org/zlib/.

  2. Rebuild Apache and activate the module:

     cd <  apache_dir  >  ./configure     ...     --activate-module=src/modules/extra/mod_deflate.o     ... 
  3. The default configuration works great, but I recommend that you change the compression ratio from 1 to 9 or at least 6 for higher compression:

     DeflateCompLevel 9 

Mod_deflate ru from Khrustalev and Sysoev is available at the following sites:

  • http://sysoev.ru/mod_deflate/mod_deflate-1.0.15.tar.gz

  • http://pflanze.mine.nu/~chris/mod_deflate/mod_deflate_readme_EN.html (documentation)

  • http://sysoev.ru/mod_deflate/ (in Russian)

Gzip_cncFor Those Without Apache System Access

Most web sites are hosted on shared servers. In this case, you cannot access the httpd.conf and "modules" folder, so there is no way to install an Apache module. Web hosting companies usually refuse to install additional components , especially third-party products. I understand them; they care about the 99.9 percent uptime they promised .

There is still a way to compress content if you can edit your .htaccess files. Gzip_cnc is not an Apache module, but rather a content handler written in Perl. This program requires that gzip be on your server in a place where you can access it.

To install Gzip_cnc, follow these steps:

  1. Copy the source code from http://www.schroepl.net/projekte/gzip_cnc/program.htm.

  2. Create a new file gzip_cnc.pl in your cgi-bin folder and paste the source code there.

  3. Add the following lines into the .htaccess file and test it:

     <Files ~ \.html?$>    Action text/html /cgi-bin/gzip_cnc.pl </Files> 

    If it does not work, verify the path to gzip in the source code.

     my $gzip_path              = '/usr/bin/gzip'; 

You can specify parameters for the Gzip_cnc two ways. The first way is by modifying the codeall the settings are in the very top of the program. The alternative way is to set up environment variables .

For More Information

To learn more about gzip_cnc, see Michael Schrpl's site at http://www.schroepl.net/projekte/gzip_cnc/.

Choosing a Compression Module

How do you choose the appropriate compression module for your Apache server? It's easy for Apache 2; mod_deflate is the only current option. For Apache 1.3, if you are looking for a commercial product, try mod_hs or mod_gzip from VIGOS; otherwise, try mod_gzip (RCI) or mod_deflate ru for maximum speed. On a shared server, try gzip_cnc.

There are some other ways to compress the output in the Apache server:

  • For PHP version 4.0.4+, add the line:

     output_handler = ob_gzhandler ; 

    in php.ini to turn on content compression (but only for PHP files, of course).

  • To turn on compression only for particular files, add the following statement:

     ob_start("ob_gzhandler"); 
  • If you use an older version of PHP, you will have to implement a gzip handler by yourself. It's easy:

     function gzip_output($output) {      return gzencode($output); } // if browser supports compression if (strstr($HTTP_SERVER_VARS['HTTP_ACCEPT_ENCODING'], 'gzip')) {     ob_start("gzip_output"); // set handler     header("Content-Encoding: gzip");  // tell browser } 
For More Information

More information about PHP compression is available at http://zend.com/zend/art/buffering.php.

Here are some tools to compress Perl output for Apache:

  • Apache::Dynagziphttp://search.cpan.org/author/SLAVA/Apache-Dynagzip-0.06/Dynagzip.pm

  • Apache::GzipChainhttp://search.cpan.org/author/ANDK/Apache-GzipChain-0.06/GzipChain.pm

I recommend that you to use an Apache compression module or a reverse proxy instead because they offer more complete solutions for compressing all your content.

Content Compression in Microsoft's IIS Server

With some help, Microsoft's IIS server can deliver compressed content to HTTP 1.1-compliant browsers. The best way to enable content encoding on IIS is to use an ISAPI filter specifically designed for this purpose. For pre-compressed content, IIS doesn't have any built-in mechanism like Apache's Multiviews .

Dynamic Content Compression for the IIS Server

Microsoft's IIS server has content compression onboard. To turn it on, follow these steps:

  1. Open the Computer Management window.

  2. Select Internet Information Services.

  3. Right-click and select Properties (see Figure 18.1).

    Figure 18.1. Setting up compression for IIS.
    graphics/18fig01.gif
  4. In Internet Information Services Properties, click Edit in the Master Properties section.

  5. In WWW Service Master Properties, check the Compress application files option and click OK (see Figure 18.2)

    Figure 18.2. Setting up compression for IISfinal step.
    graphics/18fig02.gif
  6. Restart the IIS server.

Ironically, this works with all browsers except Microsoft Internet Explorer. It is a caching problem. Internet Explorer displays the compressed file properly for the first time, but the refresh button ruins it. The file is there, and you can save it to disk, but you cannot properly see it.

A number of dynamic content compression solutions allow you to work around this limitation. All these tools are implemented as an ISAPI filter. The difference is flexibility.

PipeBoost

PipeBoost is a powerful, flexible, and easy-to-use commercial content compression solution for Microsoft IIS.

In PipeBoost, you can configure literally everything. You can set individual configurations for every web site and every folder. You can assign individual content type handling rules for each browser (see Figure 18.3).

Figure 18.3. PipeBoostsetting up browsers.
graphics/18fig03.gif

You can even set custom compression levels for each file type (see Figure 18.4).

Figure 18.4. PipeBoostsetting up MIME types.
graphics/18fig04.gif

The other good thing about PipeBoost is that it includes a powerful analyzing and monitoring system (see Figure 18.5), a cache performance analyzer, and some other useful tools.

Figure 18.5. PipeBoost performance monitor.
graphics/18fig05.gif
For More Information

To find out more about PipeBoost, see http://www.pipeboost.com/.

Inner Media SqueezePlay

SqueezePlay from Inner Media, Inc. is another commercial content compression solution for Microsoft IIS. It consists of Configurator and Accelerometer.

Configurator is a GUI tool with a lot of dialogs and settings. You can configure browser types and content types (see Figure 18.6).

Figure 18.6. SqueezePlay Configurator.
graphics/18fig06.gif

The Accelerometer is a graphics analysis tool with column diagrams. It shows sent data size and savings.

In addition to the HTTP content compression, SqueezePlay provides image optimization. An LZW license to optimize GIF files is not included; you have to buy it separately.

For More Information

To find out more about SqueezePlay, see http://www.innermedia.com/.

HyperSpace i

If you don't mind editing text configuration files, try HyperSpace i from eHyperSpace (http://ehyperspace.com/solutions/hyperspacei.html).

This tool is pretty flexible. You can set up browser types and versions, MIME types, minimum file size to be compressed, and so on. You also can add information like compressed file size and compression ratio to your log file.

This tool is not for everybody, however. There is no GUI, which means that configuring the configuration files is similar to editing Apache's httpd.conf for mod_gzip.

VIGOS IIS Accelerator

This ISAPI filter has all the configuration options from the VIGOS Website Accelerator, excluding the proxy specific ones. The filter can remove whitespace before compression, auto-detects browsers and MIME-types to avoid incompatibilities, and includes graphics optimization. It is configured with a text file, but comes with a handy configuration tool for editing this file.

HTTP Burst

There is one more tool in the market. HTTP Burst is free and looks pretty powerful and flexible. It is not easy to use, however; the authors suggest configuring it with regedit.exe .

The other problemthere is no English documentation. You can find more information at the following sites:

  • http://www.timax.net.ua/~httpburst/download.htm

  • http://www.timax.net.ua/~httpburst/opisanie.htm

Summary: IIS ISAPI Compression Filters

As long as all this software uses the same compression algorithm and the same ISAPI technology, you'll find no relevant difference in performance or compression ratio. Just keep in mind that the tools will slow down server response about 1.5 times (for example, 1.5 versus 1-second latency). On the majority of web sites, however, this added overhead is more than made up with the time saved sending compressed content. Therefore, I recommend that you make a decision based on the functionality you need, the added maintenance you are ready to take on, and the price you are willing pay.

 



Speed Up Your Site[c] Web Site Optimization
Speed Up Your Site[c] Web Site Optimization
ISBN: 596515081
EAN: N/A
Year: 2005
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net