Lars J. Kangas, [1] Kristine M. Terrones, Robert D. Keppel, and Robert D. La Moria
Battelle Pacific Northwest Division, MS K7-22, Richland, WA 99352
Attorney General of Washington, Criminal Division
When a serial offender strikes, it usually means that the investigation is unprecedented for that police agency. The volume of incoming leads and pieces of information in the case(s) can be overwhelming as evidenced by the thousands of leads gathered in the Ted Bundy murders, the Atlanta child murders, and the Green River murders. Serial cases can be long-
CATCH (Computer Aided Tracking and Characterization of Homicides) is being developed to assist crime investigations by assessing likely characteristics of unknown offenders, by relating a specific crime case to other cases, and by providing a tool for clustering similar cases that may be attributed to the same offenders.
CATCH is a collection of tools that assist the crime analyst in the investigation process by providing advanced data mining and visualization capabilities. These tools include clustering maps, query tools, geographic maps,
The clustering tools in CATCH are based on artificial neural networks (ANNs). The ANNs learn to cluster similar cases from approximately 5,000 murders and 3,000 sexual assaults residing in a database. The clustering algorithm is applied to parameters describing modus operandi (MO), signature characteristics of the offenders, and other parameters describing the victim and offender. The proximity of cases within a two-dimensional representation of the clusters allows the analyst to identify similar or serial murders and sexual assaults.
CATCH is being developed to provide crime analysts enhanced means for interpreting large databases of crime data. These databases store a large number of crimes with each case described in a large number of details. Battelle Memorial Institute's Pacific Northwest Division developed CATCH in collaboration with the Attorney General of Washington, Criminal Division. Investigators at the Criminal Division are currently evaluating CATCH.
The development of CATCH was made possible with the HITS (Homicide Investigation Tracking System) database system. Police involved in the infamous Green River and Ted Bundy murder investigations in the State of Washington developed HITS circa 10
CATCH provides analysts tools for
There are two versions of CATCH, one for murders and one for sexual assaults. Although the version of CATCH described here is custom configured
CATCH uses ANNs for analysis. The benefit of ANNs is often described by means of their information (sensor) fusion capabilities. Information fusion is the process of extracting information from several data sources in parallel. More information can frequently be
The clustering algorithm in CATCH is based on self-organizing maps (SOMs). These networks are also called
The SOMs belong to the
The HITS Unit staff at the Attorney General of Washington, Criminal Division use standard forms to record the large number of details describing each crime, which are then entered into the HITS database. CATCH processes these crime details and generates data vectors for numerical analysis. Each data vector includes more than 200 details of each crime.
The SOM in CATCH has 4,096 cells organized as a 64 x 64 grid (see Figure 12.6). The learning phase
Figure 12.6:
The SOM represents about 5,000 murders in the HITS database.
The self-organizing map in Figure 12.6 represents about 5,000 murders in the HITS database. Each of the cells in the 64 x 64 map typically contains eight or fewer crimes. The black cells contain no crimes. The lighter the cell
The tools in CATCH are of two types. First, there are database mining tools to give the crime analyst a better understanding of the content of the database. Second, there are tools that let the analyst retrieve and compare specific crimes.
The self-organizing map is like a window into the database. Each crime in the database has a location on the SOM and the clusters on the SOM link together similar crimes in the database. Thus, the database can be mined for
Figure 12.7:
Crimes are mapped by modus operandi descriptions.
Figure 12.8:
Order and description of crimes such as rape, serial and rituals can be queried.
The SOM is overlaid by boundaries around areas of common crime details. The small window shows which details are selected and the
In Figure 12.8, the depicted tool emphasizes cells containing crimes for which all selected details correctly describe the crimes. The cells in the SOM are colored lighter according to the correlation of the selected crime details (i.e., lighter cells have higher correlation with the selected crime details).
The "starmap" of crimes in CATCH is shown in Figure 12.9. This representation of all crimes in the database is a three-dimensional cube, where the data vectors describing the crimes have been reduced down to three eigenvalues. The cube is
Figure 12.9:
The figure shows all the crime data vectors as points in a three-dimensional eigenspace.
Figure 12.9 shows all the crime data vectors as points in a three-dimensional eigenspace. The cube of crimes is viewed in any two of the three dimensions. This cube of crimes gives an alternate view of the clusters and structure of the crimes in the database. Similar crimes form denser areas of "stars" in the cube. The highlighted crimes within the overlaid square are selected into the current working set.
The geographic map in CATCH is shown in Figure 12.10, with crimes placed as pins at the locations where they were committed. This map also allows the user to select an area and retrieve all the crimes in that area, or the user can crop or remove crimes from the current working set of crimes.
Figure 12.10:
Crimes can be mapped along highways.
The geographic map tool in Figure 12.10 places the current set of crimes on the map as pins (see examples in the rectangle). The user can select pins to view additional information about specific crimes.
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}
The tools described above and some additional tools (e.g., a time line tool), allow the crime analyst to retrieve crime data from the database without having to use queries. CATCH automatically generates SQL queries to retrieve
While the data mining tools are used for
Most of the data visualization tools in CATCH show the crime details in grids that are enhanced by color and order of significance. The color enhancement in the grids is used to give the user improved perception of the data without having to focus on numerical values. Typically, grid values representing crime details are lighter in color if the crime detail has a higher numerical value or if the crime detail holds true for a specific crime. The grids can also be sorted to bring more significant details to the top of the grids. The significance of each detail is dynamically computed in the sorting algorithms.
Figures 12.11 and 12.12 show two tools for comparing crime cases based on labels assigned to sexual offenders. These labels—Power Reassurance, Power Assertive, Anger Retaliatory, and Anger Excitation—were conceived by the FBI to describe the behavior of sexual offenders [1–5]. Dr. Robert Keppel [7–10], chief criminal
Figure 12.11:
Similarity of crimes can be viewed and measured via a grid.
Figure 12.12:
Comparison of crime types can be measured.
The grid in the Figure 12.11 shows several crimes, one on each row, which have been determined by CATCH to be similar to a crime being analyzed (
The tool shown in the Figure 12.12 allows the crime analyst to compare two crimes side by side according to the sexual offender labels: Power Reassurance, Power Assertive, Anger Retaliatory, and Anger Excitation. The figure shows the individual weights assigned to each of the details and the four labels describing each of the two crimes. The details of the two crimes in the figure are sorted to bring the significant details to the top. The two crimes compared in the figure are both described to have "unusual
CATCH was developed to identify serial offenders by recognizing that serial offenders tend to repeat certain aspects of their crimes. Because the neural-network algorithm clusters similar data vectors, we expect the crimes by the same offenders to be clustered close together. The graphs in Figures 12.13 and 12.14 show the summary of distances found between any pair of crimes committed by the same known offenders for murders and sexual assaults, respectively. Distances are measured as the number of cells between two crimes on the self-organizing map. A distance of zero indicates that both crimes are in the same cell, a distance of one indicates that the two crimes are in adjacent cells, etc.
Figure 12.13:
Probability and distance of crimes by the same perpetrator can be graphed.
Figure 12.14:
The solid line in the graph shows the probability of finding two sexual assaults by one serial rapist n number of cells apart.
The results shown in Figure 12.13 are based on 189 serial murders committed by 81 known offenders. The graph shows that 50% of serial murders by the same offenders are within 15 cells of each other. The results shown in Figure 12.14 are based on 412 serial sexual assaults committed by 154 known offenders. The graph shows that 50% of serial sexual assaults by the same offenders are within eight cells from each other.
The solid line in Figure 12.13 shows the probability of finding two murders by one serial murderer
n
number of cells apart. Of the related serial murders 50% are found within 15 cells of each other. The dashed line, in comparison, shows the distance between the same murders as they would appear if
The solid line in Figure 12.14 shows the probability of finding two sexual assaults by one serial rapist n number of cells apart. Of the related serial sexual assaults 50% are found within eight cells of each other. The dashed line, in comparison, shows the distance between the same sexual assaults as they would appear if randomly placed into cells in the self-organizing map. The confidence is greater than 99% against these two probability distributions having the same mean (two-tailed t-test).
Crime analysts at the Attorney General of Washington, Criminal Division, are currently evaluating CATCH. Thus, a statement regarding the utility of CATCH must
Preliminary evaluations suggest that the clustering algorithms and visualization tools in CATCH have the potential to have considerable value to crime analysts. A new version of CATCH is planned to
This work was supported by the National Institute of Justice.
1.
(
1995
)
"Coals to Newcastle? Part 1: A Study of Offender Profiling: Police Research
2. ( 1997 ), "Articulating a Systematic Approach to Clinical Crime Profiling," Criminal Behaviour and Mental Health , 1997 .
3. ( 1992 ) Crime Classification Manual , New York: Lexington Books.
4. ( 1997 ) "Antisocial Personality Disorder, Sexual Sadism, Malignant Narcissism, and Serial Murder," Journal of Forensic Sciences (1): pp. 49–60.
5. ( 1996 ), Practical Homicide Investigation: Tactics, Procedures, and Forensic Techniques , Third Edition CRC Publishing, Miami, Florida.
6. Data Exploration Using Self-Organizing Maps , Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series No. 82, Espoo.
7. ( 1997 ), "Time and Distance as Solvability Factors in Murder Cases," Journal of Forensic Sciences , (2), pp. 386–401.
8. ( 1997 ), Signature Killers , New York: Pocket Books.
9. ( 1995 ), "Signature Murders: A Report of Several Related Cases," Journal of Forensic Sciences , (4), pp. 658–662.
10. ( 1995 ), The Riverman: Ted Bundy and I Hunt the Green River Killer , New York: Pocket Books.
11. ( 1997 ), Self-Organizing Maps , Second Edition, Berlin, Springer-Verlag.
[1] Correspondence: Email: <lars.kangas@pnl.gov>; Telephone: (509) 375-3905