The Salford Predictive Modeler™ suite (SPM) includes a number of automated tools to assist in the process of feature selection under the BATTERY mechanism. For example,
Selects a subset of features at random and builds a model from this random subset only. The GUI will guide you in how to use this option, but from the command line you would issue something like:
BATTERY KEEP=100, 15
Which requests 100 models, each of which includes 15 randomly-selected predictors. If we are sure that we want certain variables included in every such model, the command would look like:
BATTERY KEEP=100, 15 CORE= X1, X2, X3, X4, X5
When we started our work to release our first release of CART (a 1993 command line version running very nicely on UNIX), I was startled by some (now long forgotten) articles claiming to describe a new technology that was more accurate, or faster, on some class of analytic problem. At the time, I assumed that such articles needed to be taken seriously because they represented peer-reviewed, solidly-researched scientific advances.
The most recent versions of Salford Predictive Modeler™ SPM PRO EX include a new BATTERY to invoke bootstrapped replication of most model types available in SPM. One of our reasons for adding this BATTERY was to provide access to the full CART engine when generating RandomForests® (RF) models. The principle advantages of this are:
Breiman's original RF uses a stripped down and simplified tree growing algorithm designed for speed. It lacks tree growing options and missing handling, and for many users Breiman's RF is confined to classification problems. By accessing the full CART engine with all of its Salford extensions and customized controls, modelers can accomplish far more sophisticated analyses, handle missing values with surrogates, apply penalties and constraints, and most importantly for those interested in continuous dependent variables, BATTERY BOOTSTRAP gives access to both Least Squares (LS) and Least Absolute Deviation (LAD) regression trees.
The principle drawback of BATTERY BOOTSTRAP is that the extra machinery comes with a computational price: RF runs under BATTERY BOOTSTRAP are much slower than under Breiman–RF. The extra robustness, ability to handle huge problems, and added controls should often make the slower runs worthwhile. Also observe that at the moment the RF post–model visualization machinery is not available.
SPM offers some degree of automatic type detection when it reads a database, but this support may still require some additional effort on the part of a user. How SPM works depends on the file type being read: