With the advent of the Web, a new kind of data is being generated and made available for analysis. This includes both Internet server data in the form of log files, as well as other mechanisms known as cookies and Web bugs, also known as invisible graphics. When a browser visits a Web site server, multiple transactions get recorded, both in the server log files and in the browser's hard disk. This kind of data can provide a trail of what sites a computer user has visited.
Almost all Web servers generate log files as ASCII comma-, space-, or tab-delimited text files. This is where every transaction between the server and browsers is recorded with a date and time, the domain name or IP address of the server making the request for each page on a Web site, the status of that request, the number of bytes transferred to that requester, the location from which a visitor arrived at a Web site (such as a search engine and the keywords used), the browser type used, and a cookie field. The breakdown of these log files usually follows this format:
Internet provider IP address, such as the following:
prominer.com or 204.58.155.58
The identification field, usually a hyphen or a dash:
-
The AuthUser field, usually the ID or password required for accessing a protected area:
:prominer :secreto
The date, time, and GMT [Greenwich Mean Time] of the transaction, such as the following:
Thu July 1712:38:09 2001
The method of transaction, usually as follows:
"GET /index/products.html
The status or error code of the transaction, usually as follows:
200 (successful transaction)
The size in bytes of the transaction:
4565
Location where visitor came from, such as the following:
http://search.google.com/bin /search?p=profiling+criminals -> /index.html
The agent log, which identifies the browser type, such as:
Mozilla/2.0 (Win98; I)
Lastly, the cookie field, such as the following:
secure.webconnect.net FALSE /cgi-bin FALSE 1234117888 C113 010218233632550410021001
What is important about the cookie is that a server will write it to a browser's hard disk and store it in a cookies.txt file under the Windows directory as a small text file for purposes usually of identification, tracking, and personalization. This Internet mechanism allows the server Web site to follow visitors and recognize returning browsers; however, it also creates digital tracks on the browser's PC. For example, the following cookies.txt file shows what Web sites this user has visited; it also shows the unique identification numbers each server assigned to this machine.
# Netscape HTTP Cookie File # http://www.netscape.com/newsref/std/cookie_spec.html # This is a generated file! Do not edit. secure.webconnect.net FALSE /cgi-bin FALSE 1234117888 C113 010218233632550410021001 www.3dfiles.com FALSE /board FALSE 1010723160 LastLoginDT 01-10- 2001%2011%3A25%20PM .snap.com TRUE / FALSE 2145916832 u_edition_0_0 clubvaio .doubleclick.net TRUE / FALSE 1920499321 id 80000000e6fc269 .flycast.com TRUE / FALSE 1293753789 atf 1_49546875499 .avenuea.com TRUE / FALSE 1279843247 AA002 964545236-17582523/965764864 .yahoo.com TRUE / FALSE 1271361644 B dhvvr4ksnrn98&b=2&f=s .acxiom.com TRUE / FALSE 2051222650 SITESERVER ID=5032d01e60ae1b242a1ab0f7dc7fddc5 .mediaplex.com TRUE / FALSE 1245628800 svid 9669550721593387491061573154 www.landsend.com FALSE / FALSE 1597685661 cust_ck 63.70.82.34.6268966965538269 63.236.54.72 FALSE / FALSE 1577837100 NewChannel C158693-166.204.10.196 63.236.54.72:80 FALSE / FALSE 1577837101 NewChannel C158699-166.204.10.196
For an investigator, a cookies.txt file from a suspect's PC or laptop provides a clear map of the sites visited by that individual. In addition, most cookies are also assigned a unique value, which is placed in the last field of the cookie and can potentially be matched against the server log files to recreate what paths and items were viewed and purchased by that individual. The following is a breakdown of the cookie standard data format:
.acxiom.com TRUE / FALSE 2051222650 SITESERVER ID=51e60ael5 | |
.acxiom.com | This is the domain of the Web site that created and issued the cookie. |
TRUE | Cookie was created by HTTP header, if FALSE it was by JavaScript. |
/ | Path variable allows Acxiom to modify cookie. |
FALSE | This is an unsecured cookie, HTTPS, SSL cookies are encrypted. |
2051222650 | Expiration date in seconds from January 1, 1970. |
SITESERVER | Name of the cookie. |
ID=51e60ael5 | Value of the cookie, usually a unique ID number. |
In Table 2.4, for example, by going to the acxiom.com server and doing a look-up for the ID-51e60ae15, a synopsis of what this browser looked at while at that site can be reconstructed.
Yet another Internet mechanism used by Web sites to track visitors, usually used in conjunction with cookies, is clear GIFs, or Web bugs. Web bugs are bits of code that are invisible to the visitors of Web sites or recipients of e-mail; however, they are present and broadcast important information about a browser, such as IP address, time of visit, and some demographics when visitors complete a form (gender, age, zip, etc.). Web bugs can also be placed in e-mail so that a site or a marketing company can report on when messages are opened or when recipients click on embedded links. Web bugs and cookies are just a couple of Internet mechanisms used by Web sites to track and identify visitors, with each generating important information about their browsing behavior.
Still another Internet technique used by Web sites to capture and store information about their visitors is the online form. These forms may be as simple as the prompt on a search engine for a keyword, or one involving the collection of an assortment of information from a visitor via a series of questions for everything from an application to a contest to a survey about preferences. Forms are used in conjunction with programs that store all the information completed by visitors into databases. Again, if an examination of a cookie.txt file identifies the sites visited, it is possible that if they use forms on their site and important information might have been captured. For example, if it can be determined that the user of a PC went to the American Airlines site based on a line in the cookie.txt file, it is possible that the user completed a form and purchased tickets online and important information may be found about that user's itinerary.