 # Linear Combination Splits in Decision Trees

Decision trees, and CART® in particular, were originally designed to split data using a single predictor. For continuous predictors the split is the form:

if X = c hen go left

else go right

For categorical predictors the split is of the form:

if X is in { value1, value2,....} then go left

else go right

The advantage of such splitting rules lies principally in their simplicity and understandability, and the terminal nodes of the tree are conjunctions of such spliters that define rules (or data segments) that may also be easy to understand. But for continuous variables the advantages go far beyond comprehensibility as the split is essentially unchanged when the predictor in question is transformed via common transforms such as log(X) or SQRT(X) etc. Technically, so long as the transform does change the rank ordering of the values of the predictor (or simply inverts the order) then the splitting rule is essentially unchanged. Further, extreme values of the predictor should not affect the selection of the best splitter as the splitter is generally towards the interior of the split variable distribution. These characteristics go a long way to making CART such an effective analytical tool even in the presence of flawed data.

In spite of the advantages of single variable splits, sometimes called orthogonal splits, there has long been an interest in linear combination splits (also known as "oblique splits") starting with the original CART monograph. (Actually Friedman discusses oblique splits in his 1975 paper on the precursor to CART). Linear combination splits are of the following form:

if a1X1 + a2X2 +a3X3 <= c then go right

else go left

A linear combination split can only be constructed from numerical variables and in the original version of CART is limited to no more than 5 predictors. A serious limitation of linear combination splits is that they depend on the mathematical form of the predictors. If we were to replace the predictor X1 with its logarithm (log(X1)) the form of the splitter would change, perhaps even leading to the dropping of the predictor from the split. Also, the linear combinations will be influenced by outliers, and if missing values occur in many predictors then the linear combination may based on a small subset of the data since all variables in the linear combination must be not missing for the linear combination to be evaluated. Given these limitations, if a modeler wished to experiment with linear combinations (LCs) CART software always made it possible.

In our initial work with CART in the early 1990s we quickly discovered both the charms and limitations of LCs in decision trees. On the one hand, allowing LCs occasionally suggested interesting new splitters that we considered adding to our database. LCs thus functioned as a form of automatic feature construction. On the other hand, LCs frequently created oddball constructions, mixing variables that made intuitive sense with other variables that we preferred to exclude from the LC. To facilitate better LC construction we introduced the concept of the Linear Combination List or LCLIST in 1996.

Linear Combination Lists (LCLISTS)

LCLISTS are controls that limit searches for linear combination splits to the variables that appear on a single list. A modeler is free to create as many LCLISTS as desired and the lists need not be mutually exclusive. The LCLIST mechanism works by searching separately for LCs in every LCLIST defined. So, suppose the modeler creates 3 LCLISTS as in:

LCLIST X1, X2, X3, X4

LCLIST X5, X6

LCLIST X7, X8, X9

These commands will not only activate the search for LCs but ti will also limit the LCs to those than can be constructed from only variables appearing in one of the lists. Whereas the original CART LCs might have combined X1, X6 and X9, the above commands would prohibit this LC as the its variables are drawn from more than one LCLIST.

What can you do with LCLISTS?

The most important thing you can do with LCLISTs is to group your predictors together into meaningfully related subsets of predictors. In the above example, variables X1 - X4 might represent four related measures of essentially the same thing. A linear combination of these four measures is both understandable and statistically defensible and might be an ideal predictor. Further, once we have discovered the optimal weights we might want to add the newly constructed feature to the database for future direct use. Another benefit of LCLISTs is that they permit construction of very small lists (e.g. two or three variables).

Once you have opened a data set and selected a dependent or target variable go to the METHOD tab of the Model SetUp dialog. On the right hand side place a check mark next to "Use Linear Combination Splitting." This is the most basic control that simply turns on legacy CART linear combination splits.

[J#61:1603]