3.2 Request Blocking

only for RuBoard - do not distribute or recompile

3.2 Request Blocking

Request blocking refers to the act of denying certain requests based on some part of the request itself (usually the URL). Like it or not, a fair amount of the content available on the Web is generally considered to be offensive, the most obvious example being pornography. Some organizations that connect to the Internet feel it necessary to prevent their users from accessing these sites. A web cache is a logical place to implement per-request blocking.

The issues surrounding request blocking fall mostly into the political realm. Furthermore, these issues are not new or unique to the Web. Just as many companies say employees should not make phone-sex calls while at work, they also say workers should not view pornographic web sites. Similarly, a parent might say that children should not have easy access to sexually explicit material, whether in the form of a magazine, video, or web site. It is a policy decision, for employers and parents, whether and to what extent request blocking should be enabled. Classifying material into offensive or inoffensive categories is a political and ideological issue and far beyond the scope of this book. Even if the classification is not controversial , it is unlikely that a particular technique or implementation is perfect. Some legitimate sites may be incorrectly blocked. Similarly, sites that should be blocked may still be allowed through.

Several request-blocking products and services are available. Some of these are "plug-ins" for web cache products; others are full proxy implementations that can be used alone or in serial with an existing web cache. The companies offering these products also provide a list of sites (or URLs) to be blocked. Usually these products require a subscription fee to receive list updates. However, some allow new sites to be added manually. A typical blocking list probably includes 100,000 or more entries.

The World Wide Web Consortium (http://www.w3c.org) has developed a content labeling scheme known as the Platform for Internet Content Selection. PICS is simply a standard way to label web pages rather than rate them. In other words, PICS specifies the structure of a label, not what to put inside it. However, PICS is often associated with content filtering, because that was one of the primary reasons for its development. A PICS-aware web cache can filter out requests based on one or more rating schemes.

Request blocking has the potential to adversely affect latency. Depending on the type of list and the searching algorithm, a large list could add noticeable delay to every request. In general, web cache systems are I/O-bound and do not require top-of-the-line processors. However, checking every requested URL against a blocking list increases the demand for CPU resources and can change the web cache into a CPU-bound application. The three most popular methods for blocking requests are exact string matches, substring matches, and regular expressions. Exact string matching is straightforward and can be implemented easily in an efficient algorithm. However, exact string matching is essentially useless for blocking URLs. Both substring matching and regular expressions are much more flexible, but at the same time, they are difficult to implement efficiently . The simple (but inefficient) algorithm for searching a list of regular expressions involves sequentially checking every entry in the list. If your cache needs to check requests against a large blocking list, you might want to consider a machine with a fast processor.

If you do employ request blocking on your caching proxy, it's best to be up-front about it. Tell your users that access to certain web sites is being blocked and why this is the case. Provide them with a copy of your organization's policy on appropriate use of your facilities. Periodically (e.g., once per year) remind your users about the policy and its enforcement. Be sure to tell them what to do in case the software incorrectly blocks legitimate content. Finally, be sure that you and your colleagues know how to make exceptions in the filtering rules.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net