Representation of Decision Trees

Before looking into what a DT can do, or even how we can build one, we need to understand the basic concepts behind the data structure.

A DT outputs a prediction based on some input attributes. There are two different kinds of DTs: classification and regression trees. There are no differences in the types of inputs; both can take continuous or symbolic values. It's the output that determines the type of tree. A DT with continuous outputs is known as a regression tree, and classification trees output categorical values instead as summarized in Table 26.1.

Table 26.1. Two Different Varieties of DTs Distinguished by the Type of Result Returned
Tree Type	Prediction	Data Type
Classification	Categorical	Symbol
Regression	Continuous	Real number

The first subsection explains data samples and their relevance. These are processed by DTs and often serve as the input/output format, too. Then we investigate the representation of the tree, revealing what makes them so simple and fast. Finally, the important parts of the tree such as the decision nodes and leaves are examined further.

Data Samples

Data is a fundamental requirement of pattern recognition, and even machine learning generally. Data is often expressed as individual samples also known as cases, instances, patterns, and, no doubt, many other things.

Essentially, a data sample is a set of attributes, a.k.a. predictor variables. Each attribute can be a continuous value (that is, floating-point number) or a symbol (that is, unordered discrete values). These attributes can represent just about anything conceptually; in the context of weapons, properties such as weight, firing rate, and maximum ammo would be attributes.

There is an additional attribute with a special meaning, known as a response variable (or dependent variable). It can be a symbol representing discrete categories (for classification) or a continuous value (for regression). This is the criteria for making decisions on each of the samples for both classification and regression problems.

Table 26.2 shows the problem of predicting the total damage inflicted based on the properties of the weapon. Damage is used as the response variable, so the others are predictor variables. Weight and type are categorical, whereas the rest are continuous.

Table 26.2. Four Data Samples with Multiple Attributes
Weight	RPM	Capacity	Range	Type	Damage
Light	47	10	40m	Handgun	5%
Heavy	200	500	100m	Machine gun	10%
Very light	6	6	25m	Handgun	4%
Very heavy	280	1000	200m	Machine gun	13%

There is no difference between the predictor and response variables; almost any attribute can be a response. The variable used as a basis for the classification process will be chosen depending on each problem. To classify weapon types, for example, we could choose the attribute for damage or capacity (among others).

Note

Data samples are commonly used throughout the fields of AI, including for other techniques that can be used for pattern recognition and prediction. This notably includes neural networks. We sidestepped the problem of classification during Chapter 17, "Perceptrons," and Chapter 19, "Multilayer Perceptrons," (focusing mostly on function approximation), but a majority of the concepts throughout this chapter can be applied directly to neural networks.

Decision Tree

A DT is basically a tree, in the standard computer science meaning of the word. The data structure is formed of nodes joined together by branches (see Figure 26.1). These branches are not allowed to form cycles, or the tree would become a graph (difficult to use for decision making).

Figure 26.1. A simple DT with the root node at the top, and each decision node as a circle. The leaves are represented as boxes.

graphics/26fig01.gif

One special node is known as the root. This is conceptually the base of the tree, and it's possible to traverse the tree from any node to the root. Another particular kind of node forms the end of each branch: leaf nodes.

This is a very broad description of a tree data structure. Anyone with reasonable experience of programming languages should be familiar with this. However, for DTs in AI, there are particular meanings associated with each of these concepts.

Each level in the tree can be considered as a decision; a node is a conditional test, and each branch is one of the possible options. More formally, the nodes contain a selection criterion, and the branches express the mutually exclusive results of the test.

Fundamentally, each conditional test sorts the data samples so that each corresponds to only one branch. If we consider all the samples as one large data set, the decisions split this set into distinct subsets as depicted in Figure 26.2. Combining these tests into a hierarchy actually splits the data into even smaller parts until the leaf is reached. Each leaf corresponds to a small but exclusive part of the initial set.

Figure 26.2. Splitting the DT into mutually exclusive subsets using the decision nodes.

graphics/26fig02.gif

Conditional Tests

Naturally, there are many possible ways to represent the decisions. These depend on the type of the attribute as well as the operation used in the conditional test. Because the attributes can be symbols and continuous values, the tests can be Boolean conditions as well as continuous relations. The number of possible results to the test corresponds to the number of branches starting at this node. Common conditional tests include the following:

Boolean Is the statement true or not? Possible results are obviously true and false.
Sign What is the sign of this expression? The results can be either positive or negative. This can be seen as a special case of the Boolean test.
Class What class does the symbol belong to? The outcome is one of the possible classes (arbitrary number).
Range In what range does a value lie? Possible results are each of the ranges that divide the domain of the value. This can be understood as a class test.

Typically, a unique conditional test over only one attribute is used (for instance, B==true). This often suffices because decisions can be combined hierarchically in the tree to form more complex decisions. As an extension to DT, however, some conditional tests involve more attributes (for instance, testing A==1 and B<0), which can improve the accuracy of the tests at the cost of simplicity. In this case, there are more combinations of possible results, so there are more branches. See Table 26.3.

Table 26.3. Four Examples of Simple Conditional Tests, as Well as a Decision Over Multiple Attributes
Conditional Test	Possible Results
A (Boolean)	true
	false
B (sign)	B > 0
	B 0
C (range)	C in [0..4]
	C in [5..9]
	C in [10..14]
A, B	A == true and B > 0
	A == false and B > 0
	A == true and B 0
	A == false and B 0

Generally, more predictor variables used in a test imply more branches. However, it's possible to use an arbitrary number of attributes and map them to a single Boolean decision. This would require evaluating expressions such as (A and B) or C. Similar equations can be used to convert continuous values to a single number. The problem with such complex expressions is that they are harder to learn!

Leaves and Response Variables

There is a single response variable associated with leaves of the tree only. These response values also can be either discrete or continuous.

Each of the leaves corresponds to a particular subset of the data samples (see Figure 26.3). For classification trees, the response variable matches the estimated category for all the samples in this leaf. On the other hand, in regression trees, continuous response variables generally indicate the average of all the samples corresponding to the leaf.

Figure 26.3. The value stored in each leaf corresponds to the response variables of the samples.

graphics/26fig03.gif

The type of response variable does not affect the structure of the tree, although a different data type container is needed in the leaves obviously! There are also key differences in the way the tree is built, but we'll get to that after we know how to use it.

Table 26.1. Two Different Varieties of DTs Distinguished by the Type of Result Returned