Sanitizing Data: Keep the Details of Your Data Mining Project Private
The SPM software suite includes a handy utility for changing all the variable names on your data to uninformative labels such X1, X2, etc. To convert a data set this way just follow the pattern:
The classic output reports the name changes, and the names of the input and output files thus:
Data Step (RUN)
Salford Predictive Modeler(R) software suite: SPM(R) Data Step version 18.104.22.1689
Records Read: 506
Records Kept: 506
\\psf\Home\Desktop\Demos\mystery.csv created with 506 records, 14 variables.
There are a number of possible motivations for this type of file conversion including sending examples of your data to tech support when you need to keep the details of your analysis private.
Note that the conversion only changes variable names and variable contents. If you had a variable named STATE$ with values like "AZ" CA" "NY" changing the name will do little to obscure the true content of the data. But name changes for a large number of continuous variables is quite effective in making your data very difficult for anyone to understand.