Section 5.3. Comparing Pages


5.3. Comparing Pages

In the case of phishing sites, the fake bank login page that you are directed to by the original email will have been copied from the real bank web site. The person behind the scam will then have added a HTML form or a link to another page that will ask for your account information. An easy way to see what has been added to the page is to download the version from the real bank site, compare the files, and look at the differences.

For this, you can use the standard Unix command diff to compare the two files, line by line. Lines that differ are output and identical lines are ignored. If consecutive lines differ in the two files, then these are output as two blocks, rather than pairs of individual lines.

The amount of whitespace at the start and end of lines can vary between similar files, downloaded from different sources. Perhaps this is a function of the browser that was used or the subsequent editing of the content. This can cause diff to report all lines as being different, which is not what you want. The -b option causes diff to ignore whitespace.

Here is an example of its output on a fake login page for keybank.com and the equivalent real page. The output has been edited down for the sake of readability.

     % diff -b fake.html real.html     7c7     < <link rel="stylesheet" href="http://accounts.keybank.com//ib2/     css/kco2obi.css"     type="text/css" media="all" />     ---     > <link rel="stylesheet" href="/ib2/css/kco2obi.css"     type="text/css" media="all"/>     46c46     < <a href="login.htm?requester=signon" >Sign On</a>     ---     > <a href=https://accounts1.keybank.com/ib2/Controller?requester=signon      >Sign On</a>     88,89d88     < <!-- text below generated by server. PLEASE REMOVE --></object>     [...]     < <IMG src="/books/1/402/1/html/2/http://geo.yahoo.com/serv?s=76001068&t=1111102403"     ALT=1 WIDTH=1 HEIGHT=1>

The output of this command can be difficult to read. Each block is preceded by the line numbers in the two files that correspond to that difference. The character that separates the numbers indicates that the difference is a change (c) or deletion (d). A left-angle bracket (<) precedes the text from the first file, and a right-angle bracket (>) precedes text from the second file.

This output shows three types of difference, two of which are commonly found in the fake pages used in phishing attempts. The first block shows that line 7 is different in the two files. The line is a link to the stylesheet used in the page. In the second file, the original page, this contained a relative link to a file in the same document tree. In the first file, the fake version, this has been changed into an absolute link that points to the same file on the keybank.com site.

The second block reports a difference on line 46 in the two files. In the original version, this is a link to a "Sign On" page on the bank site. This has been replaced in the first, fake file with a link to the page login.htm on the fake site. That page contains a HTML form that asks for personal and account information. Downloading and comparing that page with the real login page would reveal yet more differences that distinguish the fake site.

The third block, which I have edited down significantly, shows text that is present at the bottom of the fake site but which is missing from the original. While this might indicate something related to the scam, this specific example represents code that has been added by the web server that is hosting the fake site. It inserts a blank image, 1 pixel wide by 1 pixel high, which is used to track how often this page is visited. Every time that the page is requested by a browser, the image is also requested. The URL of the image (http://geo.yahoo.com/serv?s=76001068&t=1111102403) contains unique identifiers for the account on the web server (76001068) and the specific page (1111102403). Page-tracking code like this is added by some web-hosting companies as a service to their customers. It could be very useful in tracking down the people behind a scam such as this one.



Internet Forensics
Internet Forensics
ISBN: 059610006X
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net