7.3 Decision Trees


7.3 Decision Trees

A decision tree is a graphical representation of the relationships between a dependent variable (output) and a set of independent variables (inputs), usually in the form of a tree-shaped structure that represents a set of decisions. The tree may be binary (only two branches) or multibranched, depending on the algorithm used to segment the data. Each node of the tree represents a test of decisions performed by the algorithm.

To demonstrate how this works, the following decision tree can be explained in this manner. In a database with 19,186 records, 10,523 instances where a search was made and nothing was found represents 54.8% of the samples and is classified as a low alert. There are also 7,975 instances where a search yielded some minor infraction after a search was conducted; this represents 41.6% of the samples and is classified as a medium alert. Lastly, there are also 688 instances where a search yielded a contraband arrest; this represents 3.6% of the samples and is classified as a high alert.

The top node of the tree represents all of these records, which show the total number of records (19,186). This then splits into multiple branches according to the model of the automobile in the database. This first branch indicates that of all the attributes (fields) in the database, the vehicle make is the most important one in predicting and targeting a potential smuggler (see Figure 7.1).

click to expand
Figure 7.1: Decision tree used to predict probability of smuggling by make of auto.

What is interesting from this decision tree is that the rate of high alerts, or arrest, increased from the 3.6% average to 11.1% when the make of the auto was Jeep, Land Rover, Lincoln, Mercury, or Oldsmobile. Conversely, when the make was Daewoo, Kia, Porsche, Subaru, or Volvo, the rate dropped to 1.9%. For Jaguar it was 0.0%. How could such a decision tree assist customs and immigration inspectors? Obviously, rather than inspecting 100% of the autos at a crossing, through this segmentation analysis, a more intelligent approach can be taken through the analysis of seized vehicles data and the investigative data mining techniques using machine-learning algorithms.

A decision tree partitions data into smaller segments called terminal nodes or leaves that are homogeneous with respect to a target variable, such as high alerts. Partitions are defined in terms of other variables, such as the vehicle make of an auto, and are called input variables, thereby defining a predictive relationship between the inputs (vehicle characteristics) and the target output the system is attempting to predict.




Investigative Data Mining for Security and Criminal Detection
Investigative Data Mining for Security and Criminal Detection
ISBN: 0750676132
EAN: 2147483647
Year: 2005
Pages: 232
Authors: Jesus Mena

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net