Improving a CART Tree by Reducing the Number of Predictors
The basic question is:
If CART is a great variable selector, why should I do any variable selection at all? Isn't it better to let CART do everything automatically?
If I have already built a CART tree using a given list of variables, why would rebuilding with fewer predictors sometimes yield a better-performing tree? Didn't CART already make the best possible decisions regarding which variables to use in any part of the tree?
A number of points should be made regarding these issues. The first, and the simplest to understand, is that CART is a myopic model builder that looks only at the split it is currently working on. This means that CART does not look ahead to future splits to be made on the children and grandchildren of the current split.Consider just the root node for the sake of argument. Suppose that we have five relatively strong splitters for the root. Of course, CART chooses the split generating the greatest reduction in Gini impurity (by default) and then goes on to build the entire tree using the same split selection criterion. Suppose we were to split the root on the second best root node splitter. It could happen that the overall tree now generated is a better performer on test data than the default tree.
Why is this? It might be that making a slightly suboptimal root node split could yield a better overall tree, much like in chess, where a less aggressive initial move might lead ultimately to a stronger playing position. If this is the case, then why don't we use a look-ahead algorithm to build CART trees? There are two answers to this question. First, look-ahead algorithms require enormous computing power, not very different from what is required for a chess-playing computer. With greater computer power, limited look ahead will become increasingly possible. Second, in early experiments with look ahead, the benefits did not appear all that great compared to the added computational cost. In one study that used a "split re-visitation" approach in which trees were grown partially and then tested for possible modifications, performance improvements were typically less than 5 percent in classification accuracy.
So to answer the principle question simply:
Removing some predictors from consideration can improve a CART tree because those predictors generated unfortunate splits that made it harder for CART to ultimately reach a good model. By eliminating some less auspicious splits from the CART tree, the end result is a better-performing model overall.