Section 5.5. Mapping Out the Entire Web Site


5.5. Mapping Out the Entire Web Site

I want to return to our own exploration of current web sites. By following the links contained in a set of pages, either on the original server or within a local copy of the site, you can start to see the architecture of the entire site. Not only can that offer insight into how the site functions, but the directory structure itself may serve as a signature for that particular scam. Seeing the same structure on a second site may allow you to link the two together.

Making a local copy of a site using wget and looking at the directories that are created are easy ways to get an overview of its structure. But this shows you only the pages and files that are directly visible from a web browser. In those same directories, hidden from view, may be other scripts or data files that might offer up information about the operators of the site.

5.5.1. Directory Listings

If you are lucky, a directory listing, also known as an index, may be made available by your target web site. You can view this from your browser by supplying a URL that ends in a directory name, rather than that of a specific web page. Figure 5-3 is an example of what this looks like.

Figure 5-3. Example of a web server directory listing


This shows us all the files in the directory /autorank/images/.../template/ on a site. It contains mostly image files with two PHP scripts, a stylesheet, a JavaScript file, and a regular web page. The listing offers up the size of each file and the date and time when each file was last modified.

Files listed like this do not need to have been linked to, or included in, any other web page on the site. The listing is equivalent to running the ls command on a directory on a Unix system. Whether or not you are able to see listings depends on the directives specified in the web server configuration file, which for Apache is httpd.conf.

Two things need to be in place before the server will provide a listing for a specific directory. First, they must have been enabled for that directory or one that contains it. This is defined by the Indexes option, preceded by a plus (+) sign to enable them and a minus (-) sign to disable them:

     <Directory /path/to/directory>         Options +Indexes      </Directory>

The second component is the absence of a DirectoryIndex file such as index.html. Apache expects that most directories will have a specific page that is the entry point to the other content that they contain. The classic example is a file called index.html. If this is present, then a URL that ends in a directory name will return that page rather than the listing. This behavior is defined in this Apache configuration block:

     <IfModule mod_dir.c>         DirectoryIndex index.html index.htm index.shtml index.php         index.php4 index.php3 index.phtml index.cgi     </IfModule>

This defines a whole hierarchy of files. The first one in this list that the server finds in a given directory will be returned in place of the directory listing. Only if none of them are present, and directory listings are enabled, will we see the list of files.

You would think all the dodgy web sites would have this feature disabled, and many of them do, but happily for us just as many have it enabled. In some cases the scammer may not have any choice in the matter. Many phishing sites are placed within other legitimate sites, typically as a result of the site being successfully attacked. But in order for an attacker to change the Apache configuration they would need to obtain root privileges on the target system. That is considerably more difficult than simply gaining access and placing a set of files into the web hierarchy. As a result, most of these parasitic sites are stuck with whatever configuration has been applied to the host site.

Visible directory listings combined with laziness or oversight on the part of the scammer can offer up a host on information about a scam. They can be so useful that the first thing I do when I visit a phishing site is to see which directories are visible.



Internet Forensics
Internet Forensics
ISBN: 059610006X
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net