Section 6.4 Special Techniques for Web Servers

6.4 Special Techniques for Web Servers

This section covers some special techniques that are useful for Web Servers that will be especially effective against crackers changing the pages that are displayed. See also "Scouting Out Apache (httpd) Problems" on page 275.

These techniques are designed to protect against attacks such as the successful attacks in 1999 and 2000 against the main Web sites for the U.S. government's Federal Bureau of Investigation, the Central Intelligence Agency, the Justice Department, the White House (suspected), Congress, NASA, the American Broadcasting System (a U.S. television network), C-SPAN (a major U.S. cable network), and many others.

Did that scare you? More than 300,000 credit card numbers were stolen from CD Universe.^[4] A cracker calling himself Maxus claimed responsibility and tried to extort $100,000 to not publish the numbers. When not paid, Maxus published many of the numbers.^[5] A number of other sites, including SalesGate.com, had more than 20,000 credit card numbers stolen.

^[4] Some time after the theft, the site insisted that customer credit card numbers were safe.

^[5] Thanks to CNET News.com for the March 2, 2000, article on news.cnet.com, which provided some source material for this section.

All of these major breaches were on Microsoft platforms and were due to a bug in Microsoft's software, which allowed any remote user to copy the entire credit card database, which was world-readable and not encrypted.

A cracker calling himself Curador, later arrested, claimed responsibility and allegedly left this message:

"Also Greetz [greetings] to my friend Bill Gates, I think that any guy who sells Products Like SQL Server, with default world readable permissions can't be all BAD"

Microsoft did offer a patch in mid-1998, but apparently it was not widely installed by customers. It is hard for overworked SysAdmins to find the time to take systems down and install new patches every week and test them for new bugs and installation errors. Some of this speaks to the importance of an adequate budget for security work. Linux and UNIX systems also are vulnerable. A survey of Linux and UNIX systems in Australia in 2000 showed that about half of them were vulnerable to the named remote root vulnerability discovered in late 1999. The more services on a box, the greater the likelihood that one has a vulnerability that allows unauthorized access, not just to that service but to others too.

6.4.1 Build Separate Castles

For all but the one-person operations, it is recommended that you use separate dedicated boxes for each of Web pages, CGI, important databases, and e-mail. This will prevent an intrusion of one of these services from affecting the others. Past experience indicates that sendmail and CGI scripts are the components that are most vulnerable to intrusions, while the data that the intruders most want to affect are Web pages and databases. Certainly, you will want a firewall between the Internet and your systems, and this is discussed in "Firewalls with IP Chains and DMZ" on page 514.

Using different machines for different servers is a really good idea.

6.4.2 Do Not Trust CGIs

Many CGIs are "quick hacks" written by people who are not knowledgeable about security. For the majority of sites where this is true, security might be better served simply by not trusting the CGIs. This means not having the CGIs manipulate a database directly but rather to operate as front ends to another program that deals with the database. This other program, because there would be only one, could be more carefully written to be secure.

Having this program and the database server (such as Postgress) be on a different system accessible through a private Ethernet that no other systems can access will greatly increase security. See "One-Way Credit Card Data Path for Top Security" on page 302 for the scoop on this. Starting with Apache version 1.2 the suEXEC program is offered that allows you to have different CGIs run under different UIDs. This will allow protecting various parts of the system from the various CGIs. Low security (less trusted CGIs) can be isolated from those that would handle high security data.

CGIWrap is an alternative to suEXEC. Another solution, not limited to CGIs, is the use of the SubDomain kernel module that implements fine-grained access controls on a per-program basis. It is worth taking a look at, in "The Seven Most Deadly Sins" on page 27.

6.4.3 Hidden Form Variables and Poisoned Cookies

Many e-commerce sites store merchandise prices and weights in HTML Forms that are generated dynamically. Any halfway knowledgeable person simply could have clicked on his browser's Save as HTML button, edited this file to reduce the prices and weight (to save on shipping charges), dropped it into the browser as

 file://foo.html

and thus cheated you, probably legally, but most likely not to be caught unless you later scoured your billing records. Similar exploits await sites that trust the cookies they leave behind.

6.4.4 Take Our Employees, Please

Many organizations put confidential information on their sites without thinking that they really do not want this information to be generally available. One of the most common collections of confidential information that agencies and companies put up is lists of their employees, titles, and phone numbers. Large organizations will include each employee's location.

This makes it very easy for headhunters and competitors to phone employees with titles that fit their needs and try to hire them away. I have noticed this on the Web sites of organizations ranging from tiny to the U.S. Federal Aviation Administration. All you need to do is to save the Web page and print it out.

Besides allowing the hiring away of these employees, more nefarious purposes such as various scams and harassment by old flames or wannabe flames is possible. Because most agencies will connect a caller with someone if the target's full name is known, perhaps the solution is a simple database lookup facility. It could require the first and last name of the person and, possibly, that person's location name and could be a simple Perl script or even a shell script.

The Web site also might want to have a contact telephone number and, possibly, a name for various departments. Anyone with a legitimate need to find someone will show up with a badge or search warrant. Similarly, many sites will show "call for current prices" rather than give all this information to their competitors. This is a double-edged sword because many people like to do competitive shopping and cannot be bothered to telephone everyone.

6.4.5 Robot Exclusion of Web Pages

graphics/twodangerlevel.gif

A robot (also known as Web crawler, crawler, spider, or wanderer) is a program that searches all of the Web looking for Web pages. It then parses the page, usually adds the data to its database for a search engine, and then follows the page's links and repeats the process. There might be areas of your Web site that you do not want indexed. The way to arrange this is to create what is called a robot exclusion to instruct robots to avoid certain areas.

This is useful not only to prevent "good" robots from stepping into a black hole trap, intended for capturing evil robots,^[6] but also to keep them out of private areas and areas that are not intended for people to go to directly, such as shopping cart checkouts, acknowledgments for filling out guestbooks, and the like. It also can be used to not create public indices of employee name and phone number lists that you probably do not want to make available to the public (headhunters and competitors). The recognized standard for robot exclusion is at

^[6] See "Spoofing Spam Robots" on page 186 for details on this.

http://info.webcrawler.com/mak/projects/robots/norobots.html

which defines a robots.txt file that you place on your site to instruct robots where they should not go.

To summarize, you would create a robots.txt at the top of your Web site name space, for example,

www.pentacorp.com/robots.txt

that might have the following entries:

 # robots.txt for http://www.pentacorp.com/ User-agent: * Disallow: /overthrow_government/ # Hide from world Disallow: /tmp/ # Hopefully, not needed. Disallow: /employees.html # spam 'bot trap

There is an alternative standard whereby a Web page designer puts the following META tag in her documents that the robots should leave alone:

 <META name="ROBOTS" content="NOINDEX, NOFOLLOW">

The NOINDEX tag instructs the robots not to index this page; the NOFOLLOW tag instructs them not to follow any links on this page. Either may be used by itself as appropriate. Clearly, this is more expensive for you if there are a lot of pages you want left alone because the robots must read each one rather than a single robots.txt file.

A clever trick for detecting evil robots is to place a "hidden" link, using a tiny graphic that is the same color as the page's background, near the top of the page and enter the link in the robots.txt exclusion file. Because humans will not see it and "good" robots will not visit it, those that do are evil and may be dealt with accordingly. You may want to block the robots' sites. For those that dynamically generate pages, anyone that visits this "hidden" link can be shown the door by being presented with a single subsequent page with no additional links.

6.4.6 Dangerous CGI Programs Lying Around

The /httpd/cgi-bin directory is where CGI (Common Gateway Interface) programs are kept that receive the data from HTML Forms and do processing based on this form data. On many systems there are programs put there unintentionally that have severe security holes. The normal Linux security model is that a program without the set-UID and set-GID bits set is harmless because a user can harm only herself. The problem is that this does not apply to CGI programs because they are invoked on behalf of a client who is untrusted and normally run with the privileges of the user that starts Apache. Although "production" CGI programs and scripts hopefully are carefully analyzed and tested for security, temporary CGI programs may have been forgotten. Some may have been placed in the CGI directory during the installation process and not noticed. (This problem has occurred on non-Linux platforms too.)

It is important to use ls to list these out and decide which ones are not being used. The following ls command will list them out showing the time that they were last used, sorted by time. Although this is not a guaranteed way to determine which ones are used, it is a start.

 ls -ltu /httpd/cgi-bin

Attackers have been seen attempting to invoke the following CGI programs with their own data hoping to find vulnerable systems to probe and take over. Some of these may have .cgi or .pl appended to their names.

 /cgi-bin/test-cgi /cgi-bin/perl /cgi-bin/sh /cgi-bin/query /cgi-bin/counterfiglet /cgi-bin/phf /cgi-bin/handler

Certainly, perl and sh should not be in your cgi-bin directory because that is the equivalent of taking the password off the user that Apache runs under, typically http. The test-cgi seems to be a "mistake" committed at some sites. The query problem is discussed in the next section. The counterfiglet program is discussed in "CGI Counterfiglet Program Exploit" on page 291. The phf program is discussed in "CGI phf Program Exploit" on page 292.

6.4.7 CGI Query Program Exploit

One exploit that has been attempted (and which Linux and Apache may be vulnerable to) is to send active HTTP source pages to the query CGI program. The query program is installed by default as part of the Apache httpd installation on some distributions. The particular exploit that has been reported is

 /cgi-bin/query?x=<!--#exec cmd="/usr/bin/id"-->

This can be generated by a Web page (*.html) file put on any server that has a form that generates this GET query. All it takes is either a hidden variable "x" with this value or a text variable "x" to allow the cracker to use different commands. This also has been seen as %-encoded HTML, for example

 /cgi-bin/query?x=%3C%21%2D...

A program for "unhexing" such text hidden by %xy encoding is provided in "Unhexing Encoded URLs" on page 290.

Clearly, the intent is to invoke the id program to supply the UID of the user running the query program, hoping to find a system running it as root. When one is found, a few more commands are supplied to take over the system. Hand-crafting a second inetd.conf script and starting a second inetd will get a root shell service rather quickly, for example. For details on this, see "Popular Trojan Horses" on page 680. The solution is to remove the query program or move it to an inaccessible directory.

On one system I saw, it was renamed to something else, but that relies on "Security by Obscurity," which is weak security. Certainly, if it is used by a Web page, the new name could be found easily. If you do need it, be sure that it does not have this vulnerability. Disable the features that you do not need.

6.4.8 Unhexing Encoded URLs

graphics/threedangerlevel.gif

HTTP supports a hexadecimal encoding of characters that might be special to some of the software that a URL (Universal Resource Locator) might pass through. Although useful, it also gets abused by those trying to sneak past filters looking for certain patterns. These patterns might be for filtering out (rejecting) certain types of URLs, such as "jobs" and "employment" to make it harder for one's employees to look for other work on company time.

The following program, called unhex.c, copies its standard input to standard output unhexing %xy sequences. It is on the CD-ROM. It is useful for analysis of Apache logs and similar.

 /*  * Copyright 2001 Bob Toxen.  All rights reserved.  * This program may be used under the terms of the  * GNU GENERAL PUBLIC LICENSE Version 2.  *  * Offered as is with no warranty of any kind.  */ #include <stdio.h> #include <string.h> char    hex[] = "0123456789ABCDEF"; decode(int c1) {           int     c2;           int     i1;           int     i2;           if (c1 == EOF || (c2=getchar()) == EOF) {                   printf("<incomplete %% sequence>\n");                   return EOF;           }           if (c1 < ' ' || c1 > 126) {                   printf(                     "<non-printable first char \\%03o after %%>",                     c1);                   return -2;           }           if (c2 < ' ' || c2 > 126) {                   printf(                     "<non-printable second char \\%03o after %%%c>",                     c2, c1);                   return -2;           }           i1 = strchr(hex, c1) - hex;           i2 = strchr(hex, c2) - hex;           if (i1 < 0 || i1 > 15 || i2 < 0 || i2 > 15)  {                   printf("<invalid %%%c%c sequence>", c1, c2);                   return -2;           }           return i1*16 + i2; } main() {           int      c;           while ((c=getchar()) != EOF) {                    if (c == '%') {                             c = decode(getchar());                             if (c == EOF)                                      break;                             if (c == -2)                                      continue;                    }                    if (c >= ' ' && c < 127 || c == '\n' || c == '\t')                             putchar(c);                    else                             printf("\\%03o", c);           }           exit(0); }

6.4.9 CGI Counterfiglet Program Exploit

graphics/threedangerlevel.gif

The CGI counterfiglet program appears to be a perl or awk script that is vulnerable to crackers using it to execute their arbitrary programs. The following exploit has been seen:

 www.pentacorp.com - - [15/Mar/2000:23:41:23 -0500] "GET /cgi-bin/counterfiglet/nc/f=;echo;echo%20{_begin- counterfiglet_};uname%20- a;id;w ;echo%20{_end-counterfiglet_};echo HTTP/1.0" 404 301

Note that this exploit tests for the exploit, determines the victim's operating system, what user httpd is running on (hoping for root), and who is logged on. Clearly, this request gets sent to lots of systems looking for vulnerable ones. Any systems where httpd runs as root and which have T1 connections or better make fine DDoS launch points. This query should show up in the Web server's logs.

6.4.10 CGI `phf` Program Exploit

graphics/threedangerlevel.gif

The CGI phf program appears to have similar vulnerabilities to counterfiglet. The following attempted exploits have been seen:

 www.pentacorp.com - - [15/Mar/2000:23:23:59 -0500] "POST /cgi-bin/phf?Qname=x%0a/bin/sh+-s%0a HTTP/1.0" 404 205 http://www.somewhere.somesystem/cgi-bin/phf?Qalias=x%0a/ bin/cat%20/etc/passwd

6.4.11 CGI Scripts and Programs

These are rich in vulnerabilities because they are really the only programs on the system that will interact with any random person in the world who can connect to TCP port 80 (or 443) on a system. Consider that telnet, ftp, and even Apache (httpd) all have very strictly enforced, limited, and carefully reviewed rules for interacting with potential crackers. Scripts are defined roughly as ASCII text that is directly interpreted by a shell, perl, awk, or similar program. Programs are typically compiled from C, C++, or similar languages into a binary that is executed directly by the hardware. I will refer to both here as programs.

CGI programs are frequently thrown together without a careful security analysis by someone with security training. Even someone who has security training can easily make mistakes, and CGI programs rarely are tested to see if they can withstand attacks. I include "active" Web pages, such as those with server side includes (SSI) ending in .shtml, with CGI programs as a danger unless done very carefully.

I admit that my recent review of a CGI program I wrote in C in 1996 had a buffer overflow bug (now fixed) even though I always try to be very careful to check input data. Back then, buffer overflow attacks were unusual; that part of the code had not been touched since.

Statistically, you might not be able to get all the bugs out of your CGI programs. Because of that risk, the "Rings of Security" model certainly should be applied here to minimize vulnerability. In this case, Rings of Security means that the user under which the CGI programs run should be different from any other user on the system, including the user that has write access to static Web pages. An even better idea is to have all the CGI programs run on a separate system (or even several to separate out those doing database operations and those doing routine and less critical processing). The effort here is trivial and requires only that the FORM ACTION command in the respective Web pages specify the new system. Similar changes will be needed in Java and JavaScript programs.

Thus, if someone does find a vulnerability in a CGI program, there is a limit to the amount of damage that can be done. They will not be able even to alter Web pages to "leave their calling card" for all to see. One statistic I saw stated that in 1999, Web page defacements increased almost tenfold over the previous year. "Writing Secure Programs" on page 748 lists some online resources to help with writing more secure programs, including set-UID programs.

If you maintain a large site with many users, or if some of your CGIs do trusted operations such as interfacing with databases, you might want to add another "Ring of Security" by having different CGIs run under different effective UIDs, depending on how much trust is given to each. For ISPs, this is a necessity. This may be used, for example, to allow only a few carefully written CGIs to run under a UID that is allowed to talk to a database of customer data. It is assumed that the carefully written CGI code is audited by multiple people who know what sort of security problems to look for.

Apache's suEXEC feature may be used to cause a CGI to run as the user (UID) that owns the CGI rather than the user that Apache runs under. The suEXEC feature is documented in detail in

www.apache.org/docs/suexec.html

It is disabled by default so that someone who is unfamiliar with set-UID programs does not use it, which is just as well. The feature first appeared in Apache 1.2. The suEXEC program should be invoked only by Apache and the directory that it is in should be executable only by the user that runs Apache; suEXEC runs set-UID to root and does a number of sanity checks to ensure that it is not being used by crackers.

The program that suEXEC is asked to invoke should not be specified as an absolute path nor should it have /../ or suEXEC will refuse to invoke it. It will not run if invoked by a system account (low numbered UID) nor will it run in a directory that is writable by other than the owner of the CGI that suEXEC is to run.

So what kinds of vulnerabilities must a CGI program defend against? To sum it up: unanticipated data. Many a programmer forgets that even though her CGI program is carefully fed from a form that is "well behaved," it is trivial for a cracker to create a modified version of the form that is not well behaved. Many CGI programs accept data from the browser and use it to construct a command to be executed. One example would be a CGI that allows the client that is browsing the site to send e-mail to the Webmaster with a copy of the e-mail sent back to the client.

You might have a form with a TEXTAREA type of variable to accept multiple lines of comments and a text type of variable to accept the client's e-mail address. Use variable names comment and e-mail. If you write the CGI as a shell script, a simple version might look like

 #!/bin/sh ... # Parameters parsed and stored in $comment and $email # DON'T DO THIS!  SECURITY HOLE!!! (echo ~c $email;echo "$comment") \   | Mail -I -s comments webmaster@pentacorp.com

This example puts Mail in interactive mode (with "-I") so that the ~c escape may be used to specify a carbon copy to $email. The problem is that if any comment line begins with a "~", it too will be escaped. Thus a cracker could use the ~! escape sequence to execute any arbitrary command or even multiple commands.

One solution would be to prevent "~" characters from starting lines via

 #!/bin/sh ... # Parameters parsed and stored in $comment and $email # DON'T DO THIS!  SECURITY HOLE!!! (echo ~c $email;echo "$comment" | sed 's/^/ /') \   | Mail -I -s comments webmaster@pentacorp.com

Insert a space at the beginning of each line. This stops this vulnerability, but a cracker merely has to throw some semicolons into her e-mail address for a different vulnerability. If she specified an e-mail address of

 x;/bin/rm -rf /

the command generated would be

 (echo ~c x;/bin/rm -rf /;echo "$comment" | sed 's/^/ /') \   | Mail -I -s comments webmaster@pentacorp.com

As you can see, she got a /bin/rm -rf / in the command stream. Although this script certainly should not be running as root, on most Web servers all Web programs are run as http and this user also owns all HTML documents so within a few minutes this Web server would be completely worthless.

What would you do to fix this? How about adding quotes around $email to get

 #!/bin/sh ... # Parameters parsed and stored in $comment and $email # DON'T DO THIS!  SECURITY HOLE!!! (echo "~c $email";echo "$comment" | sed 's/^/ /') \   | Mail -I -s comments webmaster@pentacorp.com

What is to stop a cracker from getting newlines into $email? Although the Webmaster's nice form specified e-mail as a text type, to generate a one-line string a cracker could create her own form that changes it to a text area to get newlines into it, generating

 x ~!/bin/rm -rf /

This still would trash the system. What about using tr to eliminate any pesky newlines? You might try

 #!/bin/sh ... # Parameters parsed and stored in $comment and $email # DON'T DO THIS!  SECURITY HOLE!!! (echo "~c $email" | tr -d '\n';echo '';\   echo "$comment" | sed 's/^/ /') \   | Mail -I -s comments webmaster@pentacorp.com

Are we there yet? What if the cracker's form data was

 `cp /bin/sh /tmp;chmod 4777 /tmp/sh`

It depends. If e-mail was set via

 email="$some_other_variable"

the backprime expansion would happen at this time and the cracker's requested program would be invoked. (The backprimes cause the text between to be executed as if it were a command line given to the shell and then the text is replaced by the output of the command.)

What is the lesson to learn here? The shell (regardless of which one you use for scripts) simply is too powerful and too trusting to use for handling user input data. With C or Perl, you have more control over data. Even with these, however, mistakes can easily be made. Additionally, you need to check input data for characters "special" to any programs that will handle them, such as the shell or Mail.

Also, it is extremely important for the programs not to be vulnerable to buffer overflows.^[7] This alone is a good reason to avoid shell scripts where it is hard or impossible to detect and repel buffer overflow attacks. (The tcsh shell will tell you the number of characters in a variable, say, $foo with the ${%foo} sequence. This can be used in an if expression.) Check all data received from the browser for length and reject excessively long data in a sensible way, treating it as an attempted intrusion.

^[7] A buffer overflow is where a program reserves a fixed amount of space to hold data and accidentally tries to store more data there than there is space for. Some crackers are so sophisticated that they know what "extra" data to ask a program to store so that this specific data overwrites parts of memory deliberately to modify the program's behavior to breach security.

The most common bug that allows a buffer overflow vulnerability is the use of the gets() function, which has no way to specify how big a buffer is and to limit stored data to this size. Replacing it with fgets() is a common fix. Several standard utilities running as root have had this vulnerability. Some still may have it.

The warn program, presented below, checks for buffer overflow attempts. All CGIs should be tested for their vulnerability to buffer overflow attacks. For C programs, lots of debugging printf() functions and possibly stepping through the critical code in a debugger are recommended. In C one frequently uses the user-supplied e-mail address in a sprintf() to generate the parameter to a popen() library call to start the Mail program, database program, etc. Again, newlines, semicolons, backprimes (`), and other characters may be used by a cracker to execute arbitrary commands unless one is extremely careful.

The popen() library call uses the shell to execute the command, and so this opens up the CGI program for the various cracker attempts to insert the cracker's own commands in via the use of special shell characters. One solution is to first scan for these characters and, if they are present, deny the request, generate an alert via e-mail, pagers, X10, etc., to notify the SysAdmins of an attempted breach of security and lock this IP out of the system so that the cracker cannot make further attempts to break security.

If you choose to implement Adaptive TCP Wrappers, as discussed in "Adaptive Firewalls: Raising the Drawbridge with the Cracker Trap" on page 559, your CGIs could invoke the blockip program (which does all of this alerting for you) via a set-UID to root C program when they detect an intrusion attempt. Certainly, there is risk of a vulnerability of uncontrolled root access that doing this (allowing a CGI program to invoke a set-UID to root program) might cause. A cracker on a large system, such as America Online, could create a DoS (Denial of Service) attack that would lock out all of America Online by deliberately being bad. Additionally, although it is hard to spoof an IP address for TCP in modern Linux kernels, a determined cracker may do so to create a DoS attack by spoofing the site he desires to block. See "Defeating TCP Sequence Spoofing" on page 246 for details on this.

The following program, called warn, is believed to be safe to use. However, as with any set-UID program, there always is a risk. Although it does not lock the intruder out of HTTP because Apache presently does not use the /etc/hosts.allow and /etc/hosts.deny configuration files, it will protect the other services. The blockip program could easily be modified to update Apache's httpd.conf configuration file (typically in the /httpd/conf directory) to deny the intruder further access by adding deny commands. For example, with Apache 1.3 if you want to deny HTTP access to the .po.rootkit.com network and 212.226.253.1 IP, the following entries in the httpd.conf file will do this. You will need to send a HUP (hangup) signal to Apache (via /etc/rc.d/init.d/httpd reload,^[8] which sends a HUP signal to the parent httpd daemon) after adding these lines to httpd.conf.

^[8] Some startup scripts in /etc/rc.d/init.d have the executable bit set allowing them to be invoked directly by naming them. Some do not have this bit set, requiring them to be invoked as the first argument to the shell thusly:

 sh /etc/rc.d/init.d/httpd reload

In this latter case, this annoyance may be cured via

 chmod ugo+x /etc/rc.d/init.d/httpd

 order allow,deny deny from .po.rootkit.com deny from 212.226.253.1 allow from all

The warn program accepts a single argument, which should be the name of the program or other information to identify what the cracker tried to break into. This single argument should not exceed 20 characters and should not contain any characters that are special to the shell.

It determines the abusive numeric IP and host name from environment variables provided by Apache. Apache determines the host name from a reverse lookup of the numeric IP. Apache determines the latter from the source address in the TCP packet and again, this is hard to spoof to modern Linux systems.

The warn program is presented here. The warn.c source is on the CD-ROM.

 /*  * warn: invoke blockip to warn of attempted break-in and lock out  *  * Copyright 2001 Bob Toxen.  All Rights Reserved.  *  * Purchasers of the book "Real World Linux Security:  * Intrusion Prevention, Detection, and Recovery" may copy this  * script as needed to install and use on any system that they  * administer.  Others may not copy or use it without obtaining  * specific written permission by contacting the author at  * book@verysecurelinux.com.  *  * Offered as is with no warranty of any kind.  *  * Arguments we pass to /usr/local/secbin/blockip:  *      1. TCP Wrappers' %h expanded (host name)  *      2. TCP Wrappers' %a expanded (IP)  *      3. TCP Wrappers' %d expanded (service)  *      4. TCP Wrappers' %c expanded (user@sys)  *      5. TCP Wrappers' %u expanded (user name, if known)  */ #include <stdio.h> #include <stdlib.h> #include <string.h> #define BLOCKIP "/usr/local/secbin/blockip" const char    valid[] = "abcdefghijklmnopqrstuvwxyz"   "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._/"; char    * fix(char *string, char *null) {         char    *s;         if (!string)                   return null;         string = strdup(string);         if (!string)                   return "memerr";         for (s=string; *s; s++)                   if (!strchr(valid, *s))                             *s =  '_';         if (strlen(string) > 20)                   string[20] = '\0';         return string; } /* Validate IP. */ char    * chkip(char *string) {         char    *s;         char    *s0;         int     part;         if (!string)                   goto bad;         s = string;         for (part=0; part<4; part++) {                   s0 = s;                   while (*s >= '0' && *s <= '9')                             s++;                   if (s - s0 < 1 || s - s0 > 3                     || *s++ != "...\0"[part] || atoi(s0) > 255) {                             goto bad;                   }         }         return string; bad:    return "255.255.255.255"; } main(int argc, char **argv) {         char    *host   = fix(getenv("REMOTE_HOST"), "");         char    *ip;         char    *service;         char    *s;         if (argc <= 1)                 service = "missing";         else                 service = fix(argv[1], "missing");         ip = chkip(getenv("REMOTE_ADDR"));         if (geteuid() || setuid(0) < 0) {                 fprintf(stderr, "Must be set-uid root\n");                 exit(-1);         }         execl(BLOCKIP, "blockip", host, ip, service, "", "", 0);         exit(2); }

Table 6.3 lists the most dangerous characters in input data due to their being special characters to common Linux programs. This is not an exhaustive list. Generally, these characters should be rejected either by rejecting the request, removing them from input, or changing them to something harmless such as the underscore (_) character. Values are in decimal; backslash escape sequences are those defined in the C language.

Table 6.3. Dangerous CGI Input Characters
Character	Danger
0-31	control characters
127	control character
128-255	not ASCII
`\n`	newline: ends current line, allowing additional comments or data
`\7`	bell: sometimes used by programs as a delimeter because it should not be present in most data
`\t`	tab: field separator, corrupts single-field entry
`' '`	space: field separator, corrupts single-field entry
`'`	apostrophe: starts or ends multiword data, can break shell commands
`"`	quote: starts or ends multiword data, can break shell commands
`?`	shell single-character wildcard
`[`	shell start of range wildcard
`]`	shell end of range wildcard
`(`	shell start of subshell
`)`	shell end of subshell
`{`	shell start of combinatorial expansion
`}`	shell end of combinatorial expansion
`\|`	shell pipe
`^`	shell pipe on some UNIX systems
`<`	shell input redirection
`>`	shell output redirection
`!`	csh history substitution and I/O force redirection
`;`	shell command separator
`&`	shell command separator
`'`	shell "replace with command output" operator
`$`	shell variable substitution
`#`	shell comment
`~`	mail tilde escape; user home dir expansion
`,`	common field separator
`\`	Linux general escape

6.4.12 Enforcing URL Blocking

graphics/twodangerlevel.gif

Some organizations now block access by their employees to some HTTP sites by blocking requests containing URLs matching certain patterns. The sites blocked typically offer adult content, jobs, or hate material. Squid, a popular free open source Web Proxy Cache, offers this feature and it is customizable. An even better solution is to combine Squid with Squidguard, the latter specifically to act as a filtering "front end" for Squid. Some firewalls also offer filtering. All these ideas are expanded in "Stateful Firewalls" on page 540.

The filtered sites fight back by using HTTP's hex encoding to sneak around filters. A "%" character followed by two hexadecimal digits is translated in the ASCII character represented by that value. Most filters are not smart enough to translate these sequences before performing pattern matching. One work around is to block URLs with the "%" character as it normally is not part of a URL. Alternatively, open-source code such as Squid could easily be modified to do the translation before performing pattern matching.

6.4.13 Detecting Defaced Web Pages

This author is surprised at how a site's Web pages will get defaced due to a successful penetration by a cracker and nobody at that site will notice for hours or days. There is a simple resolution to this. It is based on some research I did on contract for GTE Laboratories on detecting when systems operate erratically. The rather simple idea is that you generate test requests and compare the answers to the results expected. If they differ, an alarm will be sounded, typically by the generation of e-mail and paging the SysAdmin and probably "cracking the whip" at the Webmistress as well. Actual flashing lights and ringing bells could occur too. (See "Adaptive Firewalls: Raising the Drawbridge with the Cracker Trap" on page 559 for details on generating these alarms.)

Defaced Web pages can be detected automatically, using the set of programs and methods discussed in "Detecting Defaced Web Pages Automatically" on page 661.

To periodically read the Web pages to check for differences, you only need to know that HTTP is a very simple protocol built on top of TCP usually directed at port 80. A request for a page consists of GET and the page's path on the server. To check on www.pentacorp.com/index.html do the equivalent of

 telnet www.pentacorp.com 80 GET /index.html HTTP/1.0 Accept: text/plain Accept: text/html User-Agent: Mozilla/4.7 (Macintosh; U; PPC) (blank line)

Be sure to include the blank line to terminate the request. The telnet program does not work in pipes and you might not want to fork a new process to check each page anyway. This is an excellent application for a Perl script. A C program would work well too.

You will want a configuration file listing the URLs to be tested. Each entry or "row" might consist of three parts: the fully qualified host name, such as www.pentacorp.com , the path to the page on that host, and the name of the file where you keep the "correct" version of the page.

To handle Web pages that vary by including the current date and time, temperature, or similar data, you will need to filter this varying data out. If you want to be extra careful, you might want to test even this for accuracy.

It is desirable that this "checking" system be on a completely separate network accessing your Web server over the Internet. This will allow it to detect other failures such as a DNS cache poisoning, ISP failure, router and switch attacks, etc. It also could measure response time to detect the server being overloaded or even being subject to a DoS attack. This same testing technique should be done for FTP servers, mail transponders, "FAX back" systems, etc.

Electronic commerce systems, such as shopping cards, can be tested as well by having dummy accounts and placing dummy orders from these accounts. This will allow testing most components. There should be at least two checks so that there is no single point of failure that might allow a cracker to actually get merchandise delivered and not be billed for it.

A particularly large outfit might want to do testing by doing real orders on real credit cards with an address of the Quality Assurance department. This would allow analysis of reliability of all phases of the operation, including the shipping companies used and the credit card clearing software and networks.

Top

6.4 Special Techniques for Web Servers

6.4.1 Build Separate Castles

6.4.2 Do Not Trust CGIs

6.4.3 Hidden Form Variables and Poisoned Cookies

6.4.4 Take Our Employees, Please

6.4.5 Robot Exclusion of Web Pages

6.4.6 Dangerous CGI Programs Lying Around

6.4.7 CGI Query Program Exploit

6.4.8 Unhexing Encoded URLs

6.4.9 CGI Counterfiglet Program Exploit

6.4.10 CGI phf Program Exploit

6.4.11 CGI Scripts and Programs

Table 6.3. Dangerous CGI Input Characters

6.4.12 Enforcing URL Blocking

6.4.13 Detecting Defaced Web Pages

6.4.10 CGI `phf` Program Exploit