Data mining is a process that uses various statistical and pattern-recognition techniques to discover patterns and relationships in data. It does not include business intelligence tools, such as query and reporting tools, on-line analytic processing (OLAP), or decision support systems. Those tools report on data and answer predefined questions, whereas data mining tools focus on finding previously unknown patterns and relationships among variables—in this case, for detecting and preventing criminal activity. While some will argue that forensics only applies to sciences used in court for convictions, the objective of recognizing threats and crime is also extremely important.
Unlike criminology, which re-enacts a crime in order to solve it, criminal analysis uses historical observations to come up with solutions. In criminal analysis, statistical examinations are performed on the frequency of specific crimes in order to evaluate the security of property and persons. Criminal analysis involves very careful evaluation of the location, time, and type of crime that has been committed at a building, neighborhood, beat, city, county, etc. Crime statistics, risks and probabilities are very much what criminal analysis is all about. Data mining, as with criminal analysis, has the same overall goal: the detection and prevention of crimes. The following scenario provides a good example of how criminal analysis works: A security professional in a large office building maintains information about all the criminal activity that has taken place on his property over three years, including the following incidents:
Auto Thefts 179 Office Thefts 142 Auto Break-in Thefts 211 Robberies 17 Burglaries of Offices 46 Aggravated Assaults 21 Rapes 2 Murders 0
One of the most important tasks of criminal analysis is to breakdown the pattern of crimes to evaluate when, where, and why they are occurring. In the case of this particular building, for example, the objective is to reduce crime by improving security. This type of analysis, however, is not as much offender-specific as target-specific; in other words, it begs the question "why is the garage a target for such a high rate of thefts?" By focusing on when, where, and why break-in auto thefts are taking place, preventive security measures can be taken to deter future criminal acts. Through research and the documentation of crimes and categorization by type of offenses, location, and time, gradual patterns and trends will emerge, which will lead to preventive solutions. This type of criminal analysis can be automated through the use of data mining for uncovering subtle patterns in large data sets.
Obviously, understanding the environment in which crime takes place is very important in criminal analysis. In this example, examining where crimes are taking place is critical; locations must be broken down by categories into main areas, such as the main entrance, side entrances, offices, common areas, walkways to the building from the garage, walkways from the streets, and the parking garage. In addition, the surrounding areas must be considered, such as adjoining buildings, strip malls, parks, residential neighborhood, etc.
In order to gauge the level of crime at this particular building, a comparison of crime data statistics can be considered by the analyst; for example, how does the rate of auto thefts for the property compare with the rate for the same crime at the local law enforcement agency levels, at the beat, district, precinct, city, county, metropolitan statistical area (MSA), state, and national levels. Using the FBI's Uniform Crime Report (UCR) codification system, rate comparisons can be made by following categories:
Motor vehicle theft
Forgery and counterfeiting
Stolen property (buying, receiving, possessing)
Weapons (carrying, possessing, etc.)
Prostitution and commercialized vice
Drug abuse violations
Offense against the family and children
Driving under the influence
All other offenses
Curfew and loitering laws (persons under 18)
Runaways (persons under 18)
To compute the comparison crime rates the following formulas can be used:
For violent crime rate (VCR) formula for building property: VCR = (total violent crime/average daily traffic) x 1,000 For violent crime rate (VCR) formula for beat, city, county, state, and nation: VCR = (total violent crime/population) x 1, 000 For property crime rate (PCR) formula for building property: PCR = (total property crime/number of targets) x 1,000
Because property crime is target-specific it must be computed differently as these crimes are not against individuals. It is worth noting that criminal analysis is very much interested in statistics, rates of occurrence, risk, probabilities, trend, and patterns, all of which can be improved through the use of data mining for detection and deterrence. A similar understanding of the environment and the targets of crime can be applied to other situations, so that rather than a building, we might perform a criminal analysis inventory of an e-commerce Web site for illegal hacking intrusions into a server.
The next phase of this type of criminal analysis is to use data mining, given the fact that a security expert or law enforcement investigator must deal with hundreds of thousands of transactions, e-mails, system calls, wire transfers, and the like for examining digital crimes. This calls for an automated methodology for behavioral profiling via pattern-recognition techniques. Data mining can provide a new dimension to criminal analysis, especially in digital crimes such as entity theft; credit card, insurance, Internet, and wireless fraud; and money laundering, where investigators and analysts must deal with large volumes of transactions in large databases. Data mining has traditionally been used to predict consumer preferences and to profile prospects for products and services; however, in the current environment, there is a compelling need to use this same technology to discover, detect, and deter criminal activity to improve the security of property, people, and countries.