Section 38.3. Squid: History and Overview


38.3. Squid: History and Overview

Squid is a caching proxy server. It normally sits "between" a web surfer and a web server. The surfer requests a web page from the proxy server. The proxy server either makes the request for the surfer to the web server (a proxy request) or serves the client the page directly if it's already saved in the proxy server's disk cache. Because the proxy server is between the client and the server, a number of options are available, including logging, filtering, and other access control. Squid can do all of these things and more.

Squid was originally based on the Harvest project, which is an ARPA-funded set of tools for building a standards-compliant web crawler. Squid is currently maintained by open source programmers around the world and is licensed under the GNU General Public License. For more information on Harvest, visit http://webharvest.sourceforge.net/ng, and for more information on Squid, visit the Squid home page at http://www.squid-cache.org.

You can choose to install Squid from either source or a binary included with your distribution. As of this writing, the latest recommended version of Squid is 2.5STABLE12. Unlike Apache, Squid does not have nearly as many compile-time options that you might be interested in. A binary Squid package from your Linux distribution usually works fine. If, however, you are interested in discovering exactly what compile-time options are available to you, you can read about them in the Squid documentation at http://squid-docs.sourceforge.net/latest/html/x220.html.

If you've installed a Squid package from your Linux distribution, the files are probably laid out in your filesystem like this:


/etc/squid

Squid configuration directory and home to the main configuration file, named Squid.conf or squid.conf


/usr/lib/squid

Programs that Squid uses to communicate with external authentication sources


/usr/sbin/squid

The Squid binary itself


/usr/share/doc/squid-2.5/

Squid documentation


/usr/share/squid

Locale-specific errors and icons


/var/log/squid

Squid log directory


/var/spool/squid

Home of the Squid disk cache

We will go through some basic options in the Squid.conf file. By default, a relatively sane set of options is chosen already. If you just want to use Squid as a basic caching proxy server, with no access control, you need to change only a few defaults. For more advanced configurations, greater tweaking of the configuration file is required.

38.3.1. http_port option

The default port on which Squid listens for connections is TCP port 3128. This can be anything you want, but you must make sure that each of your clients is configured to talk to Squid on this port. For example, in the Firefox web browser, select Edit Preferences Connection Settings. Then, assuming your Squid server is named squidserver at port 3128.

Figure 38-3. Setting proxy options in Firefox to use Squid proxy


38.3.2. cache_dir option

This option dictates not only where Squid stores its cache, but also what kind of storage system is used, how much disk Squid is allowed to use, and how the directory structure is set up. The format is:

 cache_dir storage_type directory-name megabytes L1 L2 [options] 

The default storage type is ufs, which is the original Squid storage setup. There are other options available, but they are really necessary only if you have some specific problem with ufs. For most purposes, ufs is sufficient.

The default amount of space used is 100 MB, which you almost certainly want to change. For any reasonably sized disk cache (and considering that hard drives have never been cheaper), you probably want to allocate at least a couple of gigabytes to Squid. Squid won't be able to work effectively as a cache if it doesn't have enough disk space.

L1 sets how many top level subdirectories are created in the cache_dir directory (the default is 16) and L2 sets how many second-level subdirectories are created (the default is 256).

So a standard line would look like this:

 cache_dir ufs /var/spool/squid 10000 16 256 

This sets a ufs cache directory at /var/spool/squid and allocates 10 GB to it.

38.3.3. cache_mem option

This option lets you designate how much memory Squid can use. The more memory, the more responsive your cache is. If Squid hits this limit, it will start swapping to the disk, which decreases performance dramatically.

38.3.4. cache_access_log option

This option lets you designate a logfile that will keep a record of every request processed by Squid. The format of the logfile is configurable (you can even make it look like a standard Apache logfile), but it is recommended that you leave the format in the default Squid format. There are a number of good third-party reporting tools that can parse Squid log files and provide you with reports.

38.3.5. acl option

This is probably the most complex part of the Squid setup process. Access control lists allow you to determine not only who gets to use your proxy server (via IP address, domain name, or username and password) but also what sites they get to visit. Once you understand the concept, the implementation is relatively straightforward. The most important thing to remember is that, by default, the Squid.conf file is configured with ACL lines that will deny access to everything except the localhost. In order to allow anyone to use your cache, you're going to have to create some ACLs and allow them access.

In our test office environment, let's say that you have a Linux box with two network cards installed. One network card is connected to a DSL router, and the other one to a switch. Also connected to this switch are 10 office computers. You want to set up Squid on this system to cache web requests for the office to save on bandwidth. You also want to log all access, require authentication, and block access to certain web sites. The external IP address of the DSL router is dynamic, and the internal IP address is 192.168.1.1/255.255.255.0.

Here is an example Squid.conf file for this setup. This offers some default access control (all intranet users can access the cache) and enables cache access and logging.

 http_port 192.168.1.1:3128 cache_mem 128M cache_access_log /var/log/squid/access.log cache_dir ufs /var/spool/squid 1000 16 256 acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl intranet src 192.168.1.0/24 acl SSL_ports port 443 563 acl Safe_ports port 80          # http acl Safe_ports port 21          # ftp acl Safe_ports port 443 563     # https, snews acl Safe_ports port 70          # gopher acl Safe_ports port 210         # wais acl Safe_ports port 1025-65535  # unregistered ports acl Safe_ports port 280         # http-mgmt acl Safe_ports port 488         # gss-http acl Safe_ports port 591         # filemaker acl Safe_ports port 777         # multiling http acl CONNECT method CONNECT http_access allow manager localhost http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports http_access allow localhost http_access allow intranet http_access deny all 

The three key lines related to ACLs in the file are:

 acl intranet src 192.168.1.0/24 http_access allow intranet http_access deny all 

The first ACL line defines an access control list named intranet that includes all systems with a source IP address in the range 192.168.1.0 through 192.168.1.255. This defines all of our internal office machines. The next line applies that ACL to a directive, in this case the http_access directive. This allows any system that matches the intranet ACL to have HTTP access to the cache. Finally, the last line denies access to any system not already explicitly allowed. This is usually good practice when setting up an ACL list, whether it's for a proxy cache, a firewall, or a router. Always have a "default deny" rule at the end, forcing you to explicitly allow anything that you want to provide access to. If your default policy is to allow, it's too easy to make a mistake in your configuration and let more through than you intend.

Now that we have a working configuration, let's add some settings to it. The first thing you might want to do is to restrict access to certain web sites. Whatever your company Internet access policy is, there are probably some web sites that you don't want your employees visiting. This is easy to implement in Squid , using an ACL. Here is an example ACL that defines some potentially nondesirable web sites:

 acl blocked_sites dstdomain .espn.com espn.go.com .hotmail.com 

Now that you've defined the ACL, apply it to the http_access directive:

 http_access deny blocked_sites 

Directives are processed in order, so you must ensure that your http_access deny all directive is last; otherwise, it will stop processing and override any following allow directives.

After you've added these two lines to your Squid.conf file, restart Squid and attempt to access www.espn.com. You should see a screen like Figure 38-4.

Figure 38-4. Access denied by Squid on host basis


Also, you should see a line like this in your Squid access.log:

 1136490718.221    870 192.168.1.33 TCP_DENIED/403 1419 GET http://www.espn.com/ - NONE/- text/html 

You can add as many domains as you wish to a directive like this. You can also filter on IP address and strings in URLs.



LPI Linux Certification in a Nutshell
LPI Linux Certification in a Nutshell (In a Nutshell (OReilly))
ISBN: 0596005288
EAN: 2147483647
Year: 2004
Pages: 257

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net