2.17 Standardizing Criminal Data


2.17 Standardizing Criminal Data

One of the obstacles to data preparation, integration, and analysis is dealing with unstructured free text data as captured in crime reports. As we saw in the case study in the previous chapter (1.15.3 Data Selection, Cleaning, and Coding), crime reports may contain fields, which make automated data analysis difficult. Police officers and investigators may use widely varying styles and formats in describing criminal scenes and modus operandi. Spelling errors and abbreviations may vary in these criminal reports using free-text fields making it difficult to structure the data they contain into categories for importing into data mining software.

In order to reduce this inconsistency, police departments and agencies may want to standardize their crime reports by eliminating or reducing free-text form fields and instead use reports using checklist and categorical fields. So that for example, rather than allowing investigators to enter in free-form word sequencing data such as "Southern acent," "Southern accent," "accent southern," "local accent," "accent: Southern," or "not local accent," etc. the crime report would use a table checklist format, such as this:

     CRIME REPORT
     Perpetrator Accent
     [  ] Local
     [  ] Not local
     [  ] Southern,
     [  ] etc.,

     Perpetrator Race
     [  ] White
     [  ] Black
     [  ] Hispanic
     [  ] Asian
     [  ] etc.,

     Perpetrator Height
     [  ] 5'
     [  ] 5' 2"
     [  ] 5' 4"
     [  ] 5' 6"
     [  ] 5' 8"
     [  ] 5' 10"
     [  ] 6'
     [  ] etc.,

     Perpetrator Age
     [  ] 14
     [  ] 16
     [  ] 18
     [  ] 20
     [  ] 22
     [  ] 24
     [  ] 26
     [  ] 28
     [  ] 30
     [  ] etc.,
     Perpetrator Build
     [  ] Slim
     [  ] Medium
     [  ] Heavy
     [  ] etc.,

     Perpetrator Hair Color
     [  ] Dark
     [  ] Light
     [  ] etc.,

     Perpetrator Hair Length
     [  ] Short
     [  ] Long
     [  ] etc.,
 

Another possible method by which free text descriptions from crime reports can be standardized is through the use of a text mining tool. Text mining software can extract free form text summaries as found on crime reports and create major categories. This is one possible solution in situations where there are a voluminous number of historical crime reports.



2.18 Bibliography

Business Week, (June 5, 2002) "The Intensifying Scrutiny at Airports."

Mena, J. (1999) Data Mining Your Website, Boston: Digital Press.

Mena, J. (2001) Web Mining for Profit, Boston: Digital Press.

Pyle, D. (1999) Data Preparation for Data Mining, San Francisco: Morgan Kaufmann Publishers, Inc.

Stout, R. (1997) Web Site Stats, Berkeley: Osborne McGraw-Hill.

St. Laurent, S. (1998) Cookies, New York: McGraw-Hill.



Chapter 3: Link Analysis: Visualizing Associations

3.1 How Link Analysis Works

Seeing the criminal associations hidden among all of the commercial and government databases is like finding the proverbial needle in the haystack. And that is where data mining technologies like link analysis can be employed by law enforcement investigators and intelligence analysts to help them examine graphically the anomalies and inconsistencies and connect networks of relationships and contacts hidden in the data. Link analysis is the first level by which networks of people, places, organizations, vehicles, bank accounts, telephone numbers, e-mail addresses, and other tangible entities can be discovered, linked, assembled, examined, detected, and analyzed.

Effectively combining multiple sources of data can lead law enforcement investigators and government analysts to discover patterns to help them be more proactive in their investigations. Link analysis is a good start in mapping terrorist activity and criminal intelligence by visualizing associations between entities and events. Link analysis often involves seeing via a chart or map the associations between suspects and locations, whether physical or on a network or the Internet. The technology is often used to answer such questions as who knows whom and when and where have they been in contact?