9.3. Squid

9.3. Squid

As already mentioned, the most commonly used Linux proxy server is squid. This server has been around for quite a while, and during this time it has gathered numerous features. There is no task that I could not do using squid.

The main configuration file for squid is /etc/squid/squid.conf (in some systems, it is /etc/squid.conf). The file is very large and it would make no sense to list it all here, because a large part of it is detailed comments on how to use its directives.

I will consider the main commands for controlling the proxy server. As usual, all parameters affecting productivity and security will be considered in detail. Other settings will be given a cursory look only; you can obtain detailed information on them from the comments in the configuration file.

9.3.1. HTTP Tags

One of the main reasons for connecting to the Internet is to browse Web pages. When a Web connection is through a proxy server, HTTP has to be properly configured. The following tags are used for configuring HTTP in squid.

In the http_port n tag, the n parameter specifies the port number to be used for the connection. The first things that need configuring are the ports, on which the server will track for connections from clients . These directives have the xxxx_port format. For an HTTP port, the directive looks as follows :

 http_port 8080 

Then you have to configure the browser on the client computer by specifying the IP address of the server running squid and the port allocated by this directive.

The hierarchy_stoplist tag provides a list of words that, if found in a URL, cause the data to be retrieved from the source server. I recommend that you add to this list the cgi-bin string and a question mark: hierarchy_stoplist cgi-bin ?. URLs containing this text point to scripts that may be executed on the server, and it is better not to cache their results.

Consider an example. Suppose that you retrieved a Web page with the URL www. servername .con/cgi-bin/ping.cgi that allows the ping command to be executed through the Web interface. Assume that the first time you pinged address 18.1.1.1. The result will be saved in the proxy server's cache. The next time you run the script to ping address 18.1.1.18; the browser, however, will return the result of the first ping, because it will retrieve it from the cache.

Pages containing scripts return varying results, depending on the situation and the parameters specified by the user . Caching these pages will cause the browser to always return the same result: the one stored in cache. So instead of convenience offered by using proxy, you get a headache .

The question mark is often used to pass parameters to PHP scripts; thus, these pages should not be cached either.

The hierarchy_stoplist tag prohibits you from retrieving pages from cache. The following two entries prohibit caching pages whose URLs contain the cgi-bin string or a question mark altogether.

 acl QUERY urlpath_regex cgi-bin \? no_cache deny QUERY 

I believe you can see that there is no need to cache material that must be provided by the server; this would only waste disk space.

9.3.2. FTP Tags

There are several tags to control FTP proxy operations. The following are some of the main ones:

  • ftp_passive on off Specifies the operation mode. If set to on, it enables the passive mode. This is the default setting.

    The squid server allows you to work with FTP but may require some additional configuring. For example, if squid is located behind a firewall that does not allow the passive mode, the value of this parameter should be changed to off:

     ftp_passive off 
  • ftp_user address Specifies the email address to be used as the password during the authentication procedure on anonymous FTP servers.

    No server can determine whether the email address entered is correct, so this check can be disabled. Some FTP servers, however, verify whether the email address is correct. The default string is ftp_user squid@.

    However, by default, this entry is commented out in the /etc/squid/squid.conf file. The default email address should also be changed to something like ftp_usersquid@hotmail.com .

    Any FTP server will accept this address as correct, because it complies with all rules for email addresses.

  • ftp_list_width n The n number sets the width of the FTP listing. This value should be set to fit the width of a standard browser. Setting it too small may cut off long file names .

9.3.3. Cache Configuration Tags

The efficiency and convenience of proxy server operation depend on how its cache is configured. I will try to explain in detail all pertinent tags.

  • cache_dir type directory size L1 L2 options Specifies parameters of the directory, in which the cache will be stored. The main parameters are type, directory, and size. In most cases, the value of type is set to ufs. It can be set to aufs for asynchronous input/output. I do not recommend doing this, because it may cause flawed operation.

    The directory should be located in the largest partition, so that information will not be spread over several disks. But if all you have is one disk with one partition on it, the location makes no difference.

    The default size of the directory is 100 MB. This is sufficient to speed up work for three users. If there are many users in your network and they all have different tastes or jobs (meaning, they visit different sites), the size should be increased. I allocate at least 1 GB of disk space to cache. If the server is allowed to cache large files, the allocated space will be filled in no time.

  • cache_mem n MB Specifies the maximum amount of operating memory to use as a memory cache for objects. The default value is 8 MB. If your machine is only used to run the proxy server, this value can be specified as the difference between the total size of the operating memory and the memory necessary for the operating system's needs. For example, 64 MB is more than enough for an operating system working in the text mode. Thus, if, for example, you have 512 MB of operating memory installed, you can give 448 MB of it (512 - 64 = 448) to the proxy server; the more memory the proxy server has, the more rapidly it will be able to serve frequently requested pages.

  • cache_swap_low n Sets the low-water mark of cache filling. When the percentage of cache filling exceeds the n value, the server starts cleaning it up, removing old objects until the cache filling percentage falls back to the acceptable level.

  • cache_swap_high n Sets the high-water mark of cache filling. This tag is similar to the previous one, only object evection is more intensive to prevent cache overflow.

  • minimum_object_size n KB Specifies the minimum size of objects that can be cached. The default value is 0, meaning there is no minimum threshold.

  • maximum_object_size n KB Specifies the maximum size of objects that should be cached. The default value is 4,096 KB (4 MB). Set this value low to increase the server's speed; however, you will pay an increased traffic penalty for this. If you want to save bandwidth, keep this value high.

  • maximum_object_size_in_memory n KB Specifies the maximum size of objects to be kept in the memory cache. The default value is 8 KB.

  • ipcache_size n Specifies the IP address cache size. The default value is 1,024 KB.

  • ipcache_low n and ipcache_high n Specify the minimum and maximum IP address cache filling percentage, respectively.

  • reference_age parameter Specifies objects' lifetime in the cache. Objects whose lifetime exceeds this value can be deleted. The following are a few examples:

     reference_age 1 week reference_age 3.5 days reference_age 4 months reference_age 2.2 hours 

    The default value is 1 week:

     reference_age 1 week 
  • quick_abort_min n KB A situation may arise, in which the connection is broken while an object is being retrieved. If less than the n value remains to be retrieved, retrieval of the object is completed anyway: When the connection is restored, there will be no need to repeat the retrieval. The default value is 16. Setting it to -1 disables the feature.

  • quick_abort_max n KB The same situation as with quick_abort_min, only if more than the specified value remains to be retrieved, the retrieval of the object is aborted. The default value is 16.

  • quick_abort_pct n The same situation as with the previous two tags, but only if more than the specified value has been retrieved, the retrieval will be completed.

  • negative_ttl n minutes TTL for failed requests. Negative requests (such as "Connection refused " or "404 Page Not Found") may be temporary, and they should not be cached for long periods. The default value is 5 minutes. If a request to the same address is made after this time, the server will attempt to retrieve the page from the source server instead of serving it from the cache.

  • positive_dns_ttl n hours TTL in hours for successful DNS lookups. During this period, succeeding attempts to access the same lookups to the DNS server will be served from the cache. The default value is 6 hours, which can be increased to 24 hours. A few years ago, IP addresses had a tendency to change often, so TTL values had to be set low. Nowadays, most sites have static addresses, which change only with a change of the host, and major portals have their own permanent IP addresses. Setting this parameter to 0 disables the squid's IP address caching feature.

  • negative_dns_ttl n minutes TTL for failed DNS lookups. A failure to resolve a DNS address may be caused by some temporary problems with the DNS server and not with the address. These problems are usually fixed in 2 to 3 minutes, and failed lookups should not be cached for longer than this. I set this parameter to 1 or 0, so that users could load the site they need as soon as the DNS server is back in order.

  • range_offset_limit n KB Caching parameters. If set to -1, the server will download the entire object so that it may cache it. If set to 0, squid does not fetch more than the client requested. The value greater than 0 specifies how far into the file a range request may be to make squid prefetch the whole file. If beyond this limit, the proxy does not cache the result of the range request.

9.3.4. Log Tags

There are several parameters in the squid's configuration file dealing with logs (the latter can be viewed in any text editor). These are the following:

  • cache_access_log file Specifies the path to the file, in which all user activity (namely, HTTP and ICP requests) is logged. The default value is /var/log/squid/ access.log.

  • cache_log file Specifies the path to the file in which general information about the cache activity is logged. The default value is /var/log/squid/cache.log.

  • cache_store_log file Specifies the path to the file, in which the activities of the store manager are logged. The log shows, which objects are saved to the cache and for how long, and which are evicted. The default value is /var/log/squid/ store.log. However, no utility exists for analyzing the data stored in this log. Besides, there is no practical use for this data; you only waste disk space and system resources to save them. So you will be better off to disable this log by setting the file parameter value to none.

  • log_mime_hdrs on off Indicates whether Multipurpose Internet Mail Extension (MIME) headers will be tracked. If set to on, MIME headers will be recorded in the access log.

  • useragent_log path/filename Specifies the file, in which the User Agent field from each HTTP request is logged. There is no practical use for this field, and it can be faked easily; thus, it is disabled by default.

Linux logs and various services are discussed in Section 12.5 . In Section 12.5.4 , the contents of the squid's main log /var/log/squid/access.log are considered.

9.3.5. Cache-Sharing Tags

For several squid servers to be able to communicate with each other to share their cache contents, the corresponding protocol has to be properly configured. This is done using the following tags:

  • icp_port n Specifies the port number to be used for ICP. The default value is 3130. Setting the n value to 0 disables the protocol.

  • htcp_port n Specifies the port number to be used for ICP working above TCP/IP. The default value is 4827. Setting the n value to 0 disables the protocol.

  • cache_peer hostname type http_port icp_port option Specifies other caches in the hierarchy. The hostname parameter is set to the name or address of the cache to be queried. The http_port parameter specifies the port where the cache listens for proxy requests. It corresponds to the http_port parameter in the squid configuration file. The icp_port parameter specifies the port number used by squid to send and receive ICP queries to and from neighbor caches. It corresponds to the icp_port parameter in the squid configuration file of the remote system. The type parameter can have one of the following values:

    • parent The topmost cache in the hierarchy. Forwards cache misses on behalf of a child cache.

    • sibling May only request objects already held in the cache. Cannot forward cache misses on behalf of a peer.

    • multicast Can query one or more neighboring caches.

The option parameter can take on many different values, and considering them is beyond the scope of this book. Detailed information for each option value can be found in the configuration file comments.

  • icp_query_timeout n Specifies the timeout value in milliseconds. Most often, proxy servers are located in local networks, which have high access speed; thus, there is no need to set this value to more than 2000 milliseconds (2 seconds).

  • cache_peer_domain cache_host domain Limits the domains, for which the neighbor caches can be queried. For example, the following entry allows retrieval from the cache only of data requested from the com domain:

     cache_peer_domain parent. net . com 

Requests to other domains will be ignored to avoid overloading the proxy server. This tag can be used to configure sever proxy servers, each responsible for its own domain.

9.3.6. Miscellaneous Tags

The following tags I could not place into any specific category, but they are of certain importance and need to be considered:

  • redirect_rewrites_host_header on off Enables (on) or disables (off) rewriting of host headers in redirect requests. If rewriting is enabled, the server work in the autonomous mode; otherwise , it is in the transparent mode. The autonomous mode requires extra expenses to implement but allows only one IP address to be used for the external connection for any size of network. The transparent mode is faster but requires each computer to have an IP address to work with the Internet.

  • redirector_access allow deny Specifies the list of processes sent to the redirector process. By default, all requests are sent.

  • cache_mgr email Indicates the email address, to which a notification will be sent should there be problems in the squid operation.

  • append_domain domain Indicates the default domain. Because users generally request pages from the com domain, it would be logical to specify this domain in the directive: append_domain.com. Then, if a user enters the address, for example, redhat, squid will automatically append the domain code, taking the user to the redhat.com site.

  • smtp_port n Sets the port number, on which to listen for SMTP requests on sending messages. SMTP is the type of protocol that does not require caching, and using a proxy server will not save on traffic. The feature may come in handy when a gateway cannot be installed and only a proxy is allowed.

  • offline_mode on off Indicates the operating mode. If set to on, squid will work with the cache without accessing the Internet. If the cache does not contain the page requested, an error message will be issued. To allow squid to address the Internet, the parameter must be set to off, which is the default setting.



Hacker Linux Uncovered
Hacker Linux Uncovered
ISBN: 1931769508
EAN: 2147483647
Year: 2004
Pages: 141

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net