10.2. There Are Too Many Users Accessing the Internet from the Office
High-speed, business-quality Internet connections can be expensive. While some companies try to regulate personal use of the corporate Internet connection, that can be difficult. Many use the Internet in their work. For those who allow employees to access the Internet during their free time (e.g., during the lunch hour), they may find employees overloading the connection. Some may abuse the connection for their own purposes, such as by downloading music or movies.
Whatever the cause, anything that can reduce the overall demand on a high-speed Internet connection can save significant amounts of money. A proxy server can help you reduce the load on your Internet connection, caching data commonly accessed from the Internet, and regulating access to unwanted sites, such as those with "sex" in the URL.
The standard open source method for regulating Internet connections in this way is the Squid Web Proxy Cache, documented at http://www.squid-cache.org. Commercial support for Squid is available through ViSolve, at http://squid.visolve.com/squid/. If you want the full capabilities of Squid, with content filtering, you'll also want to install SquidGuard, which is a combined filter, redirector, and access controller plug-in (http://www.squidguard.org).
Squid supports the use of multiple computers; load from one client can be sent to a cache_peer parent or sibling computer, also configured as a Squid proxy server.
10.2.1. Squid System Requirements
Any web proxy places heavy demands on a computer. When you're caching pages for a group of users who browse the Internet, you're going to need a lot of disk space, and you're going to need to get to that locally cached data fast. If the cached data is already in your RAM, so much the better. As described in the Squid documentation available from http://squid-docs.sourceforge.net/, caching stresses certain hardware systems more than others, focusing on the following hardware characteristics (in order of decreasing importance):
10.2.2. Installing Squid
As of this writing, the latest stable version of Squid is version 2.5. It is available as a standard package on our selected distributions; version 3.0 is in beta and is available in RPM format on the SUSE Linux installation CDs. The installation process is straightforward; I recommend that you use the most appropriate repository-management system to make sure to take care of dependencies (SUSE's YaST, Fedora's yum, or Debian's apt).
Squid packages use a slightly different format for names; stable packages are often named as such. For example, the Squid package on my Fedora Core 4 system is named squid-2.5.STABLE11-2.FC4.i386.rpm.
The companion squidguard package (version 1.2 as of this writing) is available from the Debian repositories as well as the SUSE installation CDs; if you want squidguard for Fedora/Red Hat, you'll have to install it from a third-party repository, such as http://dag.wieers.com/packages/squidguard/.
Debian organizes Squid packages somewhat differently; files that are included in the standard Squid package on other distributions are divided up into separate packages on Debian:
10.2.3. Configuring Squid
Once the Squid proxy server is installed, it's fairly easy to configure. For a minimal setup, all you need to do is change three settings in the basic configuration file, squid.conf, in the /etc/squid directory:
If you're using a SquidGuard database, you'll want to make sure that squid.conf points to the associated configuration file. Normally, it's /etc/squid/squidguard.conf; if you've installed the SUSE Linux version of this package, it's /etc/squidguard.conf. To make the redirection work, add the following directive to /etc/squid/squid.conf:
redirect_program /usr/bin/squidguard /etc/squid/squidguard.conf
or if you're working on SUSE Linux:
redirect_program /usr/sbin/squidguard /etc/squidguard.conf
Once you've configured Squid, you'll want to create the basic cache directory structure in /var/spool/squid (/var/cache/squid on SUSE Linux). You can create the Squid cache directory structure with the following command:
This command also serves as a configuration check; for example, if you haven't set the visible_hostname directive, this command exits with an error message. The directory structure may look a little weird; the following is the output from an ls /var/spool/squid command:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
If you look further in these directories, you'll find a large number of additional directories, ready for caching. To see this for yourself, run the following command:
Once Squid starts caching files, you'll actually begin to see stuff in directories such as /var/spool/squid/00/00.
10.2.4. Starting Squid
As with any service, you'll want to start it once configured, and make sure it starts the next time you reboot your system. In our selected distributions, you can start the Squid Web Proxy daemon with the following command:
The actions you take to make sure this service starts upon every system boot depend on whether you're on Red Hat/Fedora/SUSE or Debian. On the first set of distributions, you need to run the following command, which starts the Squid daemon in the standard multiuser and multiuser-with-GUI runlevels:
/sbin/chkconfig --level 35 squid on
Naturally, the runlevel in question should conform to the default associated with the id directive in /etc/inittab. If, for some reason, you want to keep Squid from starting in any runlevel (e.g., if you're not ready to implement Squid on your network) or want to keep it from starting when you next reboot your system, run the following command (which deactivates the Squid service on all runlevels):
/sbin/chkconfig squid off
Alternatively, on Debian distributions, new services are configured to start by default on the next reboot. To confirm, I find the following file in the /etc/rc2.d directory, which lists services that are killed and started in runlevel 2 (which corresponds to the default runlevel as defined in my Debian /etc/inittab):
Scripts that are started in a runlevel start with an S; scripts that are killed in a runlevel start with a K. If you want to keep Squid from starting in any runlevel, run the following command:
update-rc.d -f squid remove
When you're ready to restore Squid, run the following command, which makes sure it starts in runlevels 2 through 5 (don't forget the dot at the end of the command):
update-rc.d -f squid start 20 2 3 4 5 .
In Debian distributions, there may be additional Squid-related daemons. Based on the packages described earlier, they include the Squid PreFetch and Squidtail services.
10.2.5. Connecting Clients
Naturally, Squid is only one part of the equation. For Squid to do its work, you need to make sure that clients, specifically web browsers, are configured to use the Squid proxy server.
If you're testing Squid, you can configure a client web browser on one of your systems to point to the proxy server. For example, Figure 10-1 depicts where to configure a connection from a Firefox web browser through a proxy server configured on a LAN on IP address 192.168.0.1.
Figure 10-1. Configuring a manual proxy on Firefox
One way to implement this configuration for all Firefox clients on your network is to copy the appropriate configuration file, which is prefs.js in the ~/.mozilla/firefox/*.default directory. If you want to make the change permanent, you can disable write permissions for the user.