Section 5.6. Hidden Directories | Internet Forensics

5.6. Hidden Directories

Even if a web server has provides directory listings, you may not get to see all the directories. By default Unix will not report any directories, or files for that matter, that have names beginning with a period. This allows the system to store things such as application configuration files in the home directories of users, without them appearing to clutter up their directories. In a Unix shell, these can be revealed with the command ls -a. The same convention applies to web server directory listings so that even if that feature is enabled, a directory called .ebay, for example, will not be visible in the listing of the enclosing directory.

This feature is widely used in phishing sites to prevent discovery by people using web browsers and systems administrators working on the server filesystem directly. Here are some examples from real sites that use this trick:

http://www.citysupport.nl/preview/top/.PayPal
http://ebay.updates-aw-confirm.com/.eBay
http://aospda.free.frandt.com/.signin.ebay.com

It is important to note that the contents of these directories are not hidden, just the directory names. So if you know that name, you can enter it as part of a URL and reveal its contents.

5.6.1. Guessing Directory Names

Knowing the names of hidden directories, or making an inspired guess about them, can prove to be very useful when you are trying to map out a web site. Some directories are created by the software tools used to build the site and are given standard names. Others are created manually and given obvious names or names that fit with standard conventions.

Guessing these names is a process of trial and error, but once in a while you get lucky and reveal a hidden directory that the operator might not want you to know about. Here are some names that you might try:

images, icons, pics: Used to hold images used in web pages but often used as a place to store other files
css: A standard place to hold stylesheet files
javascript, js: A standard place to store JavaScript files
log, logs: Sometimes used to store data captured by phishing sites
_vti_cnf, _vti_bin, _vti_log, etc.: Directories created by Microsoft FrontPage server extensions
_notes: Contains XML format files created by Macromedia Dreamweaver

Directories like these are typically used for legitimate and often mundane purposes, but that should not deter you from looking at what they contain. In the case of sites that have been hijacked, the attacker may well use one of these directories to hold their files, banking on the fact that they won't attract scrutiny by the operator of the site.

You can find many examples of these supposedly hidden directories on sites around the world courtesy of Google. For example, using the string _vti_cnf as a Google query will return more than a million hits, most of which represent directory listings. Using search engines to locate directories and files like these is becoming a favored tool of those who want to break into web servers. The presence of certain files can indicate that a site is vulnerable to a specific type of attack. This is one more reason why you should disable directory listings on your site or enable only the feature for specific directories.

One way to ensure that web crawlers such as GoogleBot do not index a site is to create a robots.txt file and add to it all the directories that you want to remain hidden. But in doing so, you are inadvertently disclosing the names of all those directories to anyone on the Internet. You can see many examples of these files by entering the query inurl:robots.txt filetype:txt into Google.

Here is the file for one of the check-cashing web sites that I will discuss in the section "In-Depth ExampleDirectory Listings," later in this chapter.

     User-agent: *     Disallow: /archive_notices/     Disallow: /cgi-bin/     Disallow: /collections/e2k/     Disallow: /collections/government/     Disallow: /collections/news/     Disallow: /collections/now/     Disallow: /collections/pioneers/     Disallow: /collections/sep11/     Disallow: /collections/web/     Disallow: /db_dir/     Disallow: /images/     Disallow: /live_dir/     Disallow: /privage_pages/     Disallow: /spec/     Disallow: /web/     Disallow: /e2k/

Some of these directories contain files that are linked to by visible web pages on the site, but others are not. Were it not for the operator explicitly revealing these directory names, nobody would know they existed.

The correct way to hide directories under a web tree is to disable directory listings, to ensure that no other visible files link to them, and to add some form of access control to each of them.

5.6.2. Ethical Question

Once you start guessing at directory names and looking at files that the operator has not explicitly linked to from their home page, you are entering in a gray area in terms of ethics and etiquette. You should consider the different sides of this issue and come to your own conclusions.

A fundamental precept of the Internet is that you can visit any page on any web site without explicitly asking permission beforehand. As long as the site has not password-protected the page, then you will be able to retrieve the content. So you can argue that if you have the URL of a page then you have the right to visit it. It doesn't matter whether that URL is a link from the site's homepage or one that you have generated as a way to probe hidden content. If the operator of the site has made that page available, whether they realize it or not, then you are within your rights to access it.

The counterargument is that you should only access pages that are explicitly linked to from the site home page or elsewhere. All other pages are off limits. Just because you are able to access them does not give you the right to do so. Those pages should be viewed as private and confidential. The act of viewing the pages is on a different level than intentionally breaking into a computer, but you are exploiting ignorance or oversight on the part of the site operators in order to access content that they would not want you to see.

How you feel about this issue will depend on the nature of the web site that you are interested in. I feel little or no reticence about poking around a site that is involved in some kind of scam. In part, I have a sense of right that drives me to uncover the scam, and I also feel safe in knowing that the people behind the scam are not likely to complain about it and risk revealing themselves.

On the other hand, I would not dream of using the same tactics to look for hidden files on the site for a non-profit group, a company, or a government department. I might feel that doing so would exploit the innocent mistakes of the site operator. I might also be wary that, for government sites in particular, I might attract the unwanted scrutiny of their security staff.

The evolution of search engines is making this difficult issue even more complex. Google caches most of the pages that it indexes and makes those available to the public. The Internet Archive holds old versions of entire web sites. If content that would otherwise be hidden turns up by mistake in one of these resources, is it ethical to use that information or should you ignore it?

As you work your way through this book, I hope that you will think about these issues and figure out where you stand on them.