2.8 Government Data


2.8 Government Data

There is also a vast amount of government statistical data available from portals, such as FedStats.gov, enabling the search of information via keyword across agency sites. There is also an immense amount of aggregate and statistical information from FirstGov.gov, a portal to 30 million government pages. Appendix A lists additional sites with government data.



2.9 Internet Data

With the advent of the Web, a new kind of data is being generated and made available for analysis. This includes both Internet server data in the form of log files, as well as other mechanisms known as cookies and Web bugs, also known as invisible graphics. When a browser visits a Web site server, multiple transactions get recorded, both in the server log files and in the browser's hard disk. This kind of data can provide a trail of what sites a computer user has visited.

2.9.1 Log Files

Almost all Web servers generate log files as ASCII comma-, space-, or tab-delimited text files. This is where every transaction between the server and browsers is recorded with a date and time, the domain name or IP address of the server making the request for each page on a Web site, the status of that request, the number of bytes transferred to that requester, the location from which a visitor arrived at a Web site (such as a search engine and the keywords used), the browser type used, and a cookie field. The breakdown of these log files usually follows this format:

  1. Internet provider IP address, such as the following:

       prominer.com or 204.58.155.58
     

  2. The identification field, usually a hyphen or a dash:

       -
     

  3. The AuthUser field, usually the ID or password required for accessing a protected area:

       :prominer :secreto
     

  4. The date, time, and GMT [Greenwich Mean Time] of the transaction, such as the following:

       Thu July 1712:38:09 2001
     

  5. The method of transaction, usually as follows:

       "GET /index/products.html
     

  6. The status or error code of the transaction, usually as follows:

       200 (successful transaction)
    
     

  7. The size in bytes of the transaction:

       4565
     

  8. Location where visitor came from, such as the following:

       http://search.google.com/bin
       /search?p=profiling+criminals -> /index.html
     

  9. The agent log, which identifies the browser type, such as:

       Mozilla/2.0 (Win98; I)
     

  10. Lastly, the cookie field, such as the following:

       secure.webconnect.net    FALSE    /cgi-bin    FALSE
       1234117888     C113     010218233632550410021001
     

2.9.2 Cookies

What is important about the cookie is that a server will write it to a browser's hard disk and store it in a cookies.txt file under the Windows directory as a small text file for purposes usually of identification, tracking, and personalization. This Internet mechanism allows the server Web site to follow visitors and recognize returning browsers; however, it also creates digital tracks on the browser's PC. For example, the following cookies.txt file shows what Web sites this user has visited; it also shows the unique identification numbers each server assigned to this machine.

# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This is a generated file!  Do not edit.

secure.webconnect.net  FALSE  /cgi-bin  FALSE  1234117888  C113
010218233632550410021001

www.3dfiles.com  FALSE  /board  FALSE  1010723160  LastLoginDT  01-10-
2001%2011%3A25%20PM

.snap.com  TRUE  /  FALSE  2145916832  u_edition_0_0  clubvaio

.doubleclick.net  TRUE  /  FALSE  1920499321  id   80000000e6fc269

.flycast.com  TRUE  /  FALSE  1293753789  atf  1_49546875499

.avenuea.com  TRUE  /  FALSE  1279843247  AA002  964545236-17582523/965764864
.yahoo.com  TRUE  /  FALSE  1271361644  B  dhvvr4ksnrn98&b=2&f=s

.acxiom.com  TRUE  /  FALSE  2051222650  SITESERVER
ID=5032d01e60ae1b242a1ab0f7dc7fddc5

.mediaplex.com  TRUE  /  FALSE  1245628800  svid  9669550721593387491061573154

www.landsend.com  FALSE  /  FALSE  1597685661  cust_ck  63.70.82.34.6268966965538269

63.236.54.72  FALSE  /  FALSE  1577837100  NewChannel  C158693-166.204.10.196

63.236.54.72:80  FALSE  /  FALSE  1577837101  NewChannel  C158699-166.204.10.196
 

For an investigator, a cookies.txt file from a suspect's PC or laptop provides a clear map of the sites visited by that individual. In addition, most cookies are also assigned a unique value, which is placed in the last field of the cookie and can potentially be matched against the server log files to recreate what paths and items were viewed and purchased by that individual. The following is a breakdown of the cookie standard data format:

Table 2.5: Anatomy of a Cookie

.acxiom.com  TRUE  /  FALSE  2051222650  SITESERVER
ID=51e60ael5
 

.acxiom.com

This is the domain of the Web site that created and issued the cookie.

TRUE

Cookie was created by HTTP header, if FALSE it was by JavaScript.

/

Path variable allows Acxiom to modify cookie.

FALSE

This is an unsecured cookie, HTTPS, SSL cookies are encrypted.

2051222650

Expiration date in seconds from January 1, 1970.

SITESERVER

Name of the cookie.

ID=51e60ael5

Value of the cookie, usually a unique ID number.

In Table 2.4, for example, by going to the acxiom.com server and doing a look-up for the ID-51e60ae15, a synopsis of what this browser looked at while at that site can be reconstructed.

2.9.3 Web Bugs

Yet another Internet mechanism used by Web sites to track visitors, usually used in conjunction with cookies, is clear GIFs, or Web bugs. Web bugs are bits of code that are invisible to the visitors of Web sites or recipients of e-mail; however, they are present and broadcast important information about a browser, such as IP address, time of visit, and some demographics when visitors complete a form (gender, age, zip, etc.). Web bugs can also be placed in e-mail so that a site or a marketing company can report on when messages are opened or when recipients click on embedded links. Web bugs and cookies are just a couple of Internet mechanisms used by Web sites to track and identify visitors, with each generating important information about their browsing behavior.

2.9.4 Internet Forms

Still another Internet technique used by Web sites to capture and store information about their visitors is the online form. These forms may be as simple as the prompt on a search engine for a keyword, or one involving the collection of an assortment of information from a visitor via a series of questions for everything from an application to a contest to a survey about preferences. Forms are used in conjunction with programs that store all the information completed by visitors into databases. Again, if an examination of a cookie.txt file identifies the sites visited, it is possible that if they use forms on their site and important information might have been captured. For example, if it can be determined that the user of a PC went to the American Airlines site based on a line in the cookie.txt file, it is possible that the user completed a form and purchased tickets online and important information may be found about that user's itinerary.