Representation of Decision Trees
Before looking into what a DT can do, or even how we can build one, we need to understand the basic concepts behind the data structure.
A DT outputs a prediction based on some input attributes. There are two different kinds of DTs: classification and regression trees. There are no differences in the types of inputs; both can take continuous or symbolic values. It's the output that determines the type of tree. A DT with continuous outputs is known as a regression tree, and classification trees output categorical values instead -as summarized in Table 26.1.
Table 26.1. Two Different Varieties of DTs Distinguished by the Type of Result Returned
The first subsection explains data samples and their relevance. These are
Data is a fundamental requirement of pattern recognition, and even machine learning
Essentially, a data sample is a set of
There is an additional attribute with a special meaning, known as a response variable (or dependent variable ). It can be a symbol representing discrete categories (for classification) or a continuous value (for regression). This is the criteria for making decisions on each of the samples for both classification and regression problems.
Table 26.2 shows the problem of predicting the total damage inflicted based on the properties of the weapon. Damage is used as the response variable, so the others are predictor variables. Weight and type are categorical, whereas the rest are continuous.
Table 26.2. Four Data Samples with Multiple Attributes
There is no difference between the predictor and response variables; almost any attribute can be a response. The variable used as a basis for the classification process will be
Data samples are commonly used throughout the fields of AI, including for other techniques that can be used for pattern recognition and prediction. This notably includes neural networks. We sidestepped the problem of classification during Chapter 17, "Perceptrons," and Chapter 19, "Multilayer Perceptrons," (focusing mostly on function
A DT is basically a tree, in the standard computer science meaning of the word. The data structure is
Figure 26.1. A simple DT with the root node at the top, and each decision node as a circle. The leaves are represented as boxes.
One special node is known as the root . This is conceptually the base of the tree, and it's possible to traverse the tree from any node to the root. Another particular kind of node forms the end of each branch: leaf nodes.
This is a very broad description of a tree data structure. Anyone with reasonable experience of programming languages should be familiar with this. However, for DTs in AI, there are particular meanings associated with each of these concepts.
Each level in the tree can be
Fundamentally, each conditional test sorts the data samples so that each corresponds to only one branch. If we consider all the samples as one large data set, the decisions split this set into distinct
Figure 26.2. Splitting the DT into mutually exclusive subsets using the decision nodes.
Naturally, there are many possible ways to represent the decisions. These depend on the type of the attribute as well as the operation used in the conditional test. Because the attributes can be symbols and continuous values, the tests can be Boolean conditions as well as continuous relations. The number of possible results to the test corresponds to the number of branches starting at this node. Common conditional tests include the following:
Typically, a unique conditional test over only one attribute is used (for instance, B==true). This often suffices because decisions can be combined hierarchically in the tree to form more complex decisions. As an extension to DT, however, some conditional tests involve more attributes (for instance, testing A==1 and B<0), which can improve the accuracy of the tests -at the cost of simplicity. In this case, there are more combinations of possible results, so there are more branches. See Table 26.3.
Table 26.3. Four Examples of Simple Conditional Tests, as Well as a Decision Over Multiple Attributes
Generally, more predictor variables used in a test imply more branches. However, it's possible to use an arbitrary number of attributes and map them to a single Boolean decision. This would require evaluating expressions such as (A and B) or C. Similar equations can be used to convert continuous values to a single number. The problem with such complex expressions is that they are harder to learn!
Leaves and Response Variables
There is a single response variable associated with leaves of the tree only. These response values also can be either discrete or continuous.
Each of the leaves corresponds to a particular subset of the data samples (see Figure 26.3). For classification trees, the response variable matches the estimated category for all the samples in this leaf. On the other hand, in regression trees, continuous response variables generally
Figure 26.3. The value stored in each leaf corresponds to the response variables of the samples.
The type of response variable does not affect the structure of the tree, although a different data type container is needed in the leaves -obviously! There are also key differences in the way the tree is built, but we'll get to that after we know how to use it.