DiscussionDecision trees are extremely efficient at computing estimates for unknown samples. The data structure is simple and small, and requires little effort to traverse. DTs are particularly suited to data-mining problems, so mass processing of information in real time is not a problem. In addition, due to the small amounts of information needed, the footprint for storing DTs in memory is very small. After the knowledge has been encoded in the tree, there is often little need to keep the data samples.
Finally, DTs are relatively flexible. They can be used on continuous and symbolic
However, some disadvantages are worth noting. Decision trees are well suited to batch processing data sets. When it comes to learning them online, however, existing algorithms can be somewhat clumsy (both memory
The recursive partitioning algorithm is greedy in nature. This
Last but not least, there can be problems dealing with overfitting. Secondary pruning phases can be used to remedy the problem, but this requires additional computation and additional data to validate the result. Integrated algorithms, sadly, loose the benefit of simplicity. |
SummaryThere are two kinds of decisions trees: Classification trees result in categorical responses, and regression trees return continuous values. The representation of DTs is very intuitive, and almost identical in both cases:
The simulation algorithm uses a data sample to traverse the tree according to the results of each conditional test. The training -or induction -algorithm operates with recursive partitioning:
The best way to improve the training is to manage the data set, using additional validation and testing sets. Pruning also is a very effective option in computer
Because DTs are capable of finding patterns in data, and learning to recognize them, the
|