Section 17.6 Detecting Defaced Web Pages Automatically

17.6 Detecting Defaced Web Pages Automatically

Now, you have secured your Web server tighter than the FBI and CIA (both of their sites have been broken into), you have set up your firewall, and you have programs to scan the log files for suspicious entries. Are you done yet? The numbers for known Web site defacements are shown in Table 17.1. They show this problem continuing to increase rapidly. The estimate for 2000 is mine and I very conservatively multiplied the numbers for January through April by three, which does not allow for a clearly increasing condition.

Table 17.1. Web Site Defacements (per year)
Year	Defacements^[*]
1997	40
1998	244
1999	3736
2000	4881 (est.) 5822 (actual)

^[*] The Attrition organization monitors Web site defacements. This table is excerpted from www.attrition.org/mirror/attrition/os.html

Many large and well-known sites get defaced and remain in that state for hours or longer because no one notices the problem. This is because most sites do not monitor their Web pages for alterations. Crackers enjoy the publicity and some SysAdmins have discovered the problem when they got the paper in the morning. Some companies will be happy to monitor your site for a fee. One Web site advocates updating the Web pages from another system every hour. This leaves a 59-minute window and you will not even know if the site has been defaced (cracked). Worse, a smart cracker will notice the modification times (create times) of the pages and simply redeface the pages five minutes after the SysAdmin has updated the pages. If a cracker manages to intercept communication upstream from the Web server, this hourly update scheme will fail completely because it will not be the server itself that was cracked.

The solution is to monitor your site's Web pages yourself. This is easier than it sounds and small sites can set this up in an hour or two. For highest security, it is recommended that the system which is used to monitor the Web site be on a completely separate network using a different ISP. This is to detect intrusions that intercept your site upstream from the Web server. For example, this will detect a DNS attack where the cracker alters your DNS information or where he attacks your ISP. If your monitoring system is inside your firewall using your own DNS servers, you will not detect these problems. For medium and large sites, the slight extra cost is well worth the security.

I created the tcpread.c program to do the hard part of reading a specified Web page. It is available from the CD-ROM. It can output the page to standard out to enable you to make a "reference" snapshot or to allow you to do your own "diff" to compare the page to what it is expected to be. Alternatively, it will do the "diff" itself and return a 0 exit status if it successfully read the page and it matched. It will return various other exit codes for the different failures. By default, it will read from TCP port 80 and will timeout and return an error if it does not complete the download within 60 seconds. It accounts for the date field varying and will handle binaries as well, such as images and sound files.

In its simplest usage, it expects the Web server's host name and the page to download. This page name is what you would supply to a browser, starting with the slash that follows the host name. It accepts an optional third argument to specify how many seconds the entire download is allowed to take, rather than the default of 60 seconds. Thus, if the site is too slow, tcpread will generate an error that you can detect. This prevents tcpread from hanging. This allows you to detect DoS attacks that slow your system to an unacceptable level. The optional fourth argument is the port number to use, rather than the standard 80.

The -f correct_file argument allows you to specify a file with the contents that the page should contain. Instead of outputting the page to standard output, tcpread will compare the page to correct_file and will return a status code of 0 if they match, 1 if they differ, 4 if tcpread timed out before reading the entire page, or a different value on other errors. The -o flag may be used with -f to cause the page contents to be sent to standard out anyway. The -b flag, when the page contents is sent to standard out, will output only the body of the Web page; this is useful to capture an image file for viewing to confirm correctness. This is useful for an image that varies in real time, that requires a person to check visually.

To use this program, it is necessary first to generate a list of files to watch. Let us assume that the root of the Apache document tree is /httpd/htdocs on the Web server and that on the monitoring machine the root of the correct_file tree is watch. On the Web server, issue these commands; root permission will not be necessary. The script wwwgenlist may be invoked instead; it is on the CD-ROM.

 #!/bin/csh -f cd /httpd/htdocs find . -type d ! -name . -print \   | sed 's/..//' > $HOME/m_dirs # Find and strip leading "./" find . -type f -print | sed 's/..//' > $HOME/m_files

Edit these files to remove anything that you do not want to monitor, transport these files to the monitoring system, place them in the watch directory, and cd to watch. and create the directory tree.

 csh umask  077 foreach i ( `cat m_dirs` )          echo $i          mkdir $i end exit

Build the tcpwatch program and place it in your personal bin directory. It makes it too easy to steal Web pages so it should not be available generally.

 make tcpwatch mv tcpwatch $HOME/bin/.

Create your reference files with this script, called capture. We allow a timeout of 120 seconds per file; hopefully your site is faster than this. A log of the errors will be placed in the capture.log file.

 #!/bin/csh -f umask 077 touch capture.log foreach i ( `cat m_files` )          echo $i          tcpread www.pentacorp.com /$i 120 > ! $i          if ( $status != 0 ) then                  echo == COULD NOT CAPTURE $i \                    | tee -a capture.log          endif end echo ===== errors cat capture.log

The following script, called wwwscan, will scan the static Web pages and send e-mail if any pages do not match. Dynamic pages could be checked with the addition of some parsing using perl, sed, etc., to allow variance in the dynamic parts. It might be worth the safety to have the most important pages be static so that testing for exact matches is possible. The script could be modified easily to page you or even to connect to the appropriate systems with SSH and switch to the SecBack (security backup) Web server or shut it down pending repair. These shutdown actions are only suitable for certain sites because a momentary network delay could trigger this. Because the program returns a different error code for a timeout than it does for a difference, this can control the actions.

 #!/bin/csh -f # # Copyright 2001 Bob Toxen.  All rights reserved. # This program may be used under the terms of the # GNU GENERAL PUBLIC LICENSE Version 2. # # Offered as is with no warranty of any kind. set email=admin@pentacorp.com joe@homesys.com set tmp=tmp$$ set host=www.pentacorp.com foreach i ( `cat m_files` )          echo $i          tcpread -o -f $i $host /$i >&! $tmp          set x=$status          if ( $x != 0 ) then                   if ( $x == 1) then                            set subj=diff                   else                            if ( $x == 4) then                                     set subj=timeout                            else                                     set subj=unknown                            endif                   endif                   set bad=$i.bad                   mv $tmp $bad                   diff $i $bad | Mail -s \                     "WWW ERR: ${subj}: $bad" $email # Could generate pages: see blockip.csh          else                   /bin/rm $tmp          endif end

The monitoring system also could conduct automated tests of other components such as e-mail, filling out various forms, conducting test transactions, etc. To test a "GET" style of form, fill out the form once using Netscape and execute it. When the results are displayed, Netscape displays the URL used, which will be the name of the CGI program, followed by a "?" and the form variables separated by "&". You then would highlight this data and "drop it" into a script that invokes tcpread. If your contents vary over time, you will need to use some combination of perl, sed, grep, and awk to filter out the changes to get a successful "diff".

The following is a typical URL for a form.

www.cavu.com/cgi-bin/sunset?loctype=ID&loc=JFK&date=today

Highlight the portion of this text that starts after the host name; it will become the second argument that you want to pass to tcpread. You will need to select a file to store the output in for the reference. The scripts discussed earlier will not handle this, due to the "?" and "&" characters being special to the shell. You will need to quote this argument when you pass it to tcpread. Similarly, e-mail could be tested by generating it and then reading the /var/spool/mail/testusr e-mail file.

You could get more sophisticated and monitor traffic in response to your query. Because you know what network traffic should be generated, you could detect whether an e-mail with your test credit card number gets mailed to some unknown site.

Top

17.6 Detecting Defaced Web Pages Automatically

Table 17.1. Web Site Defacements (per year)