Add Random Characteristics

 

Adding random characteristics can be useful for multiple reasons, such as randomly dividing data into training and test data, or including several of the random characteristics in an analysis to determine which characteristics have less predictive power than the random variables. Adding Random Characteristics always adds random characteristics to the Training Table. It also adds them to the Test Table if one has been selected and it is separate from the Training Table. Some examples of random characteristics and how to create them are below:

 

 

How to Create

Possible Use

Explanation

1 Characteristic

TRUE 50% of the time

Dividing Data Into Training and Test Data

This will create one new field in the training table (and test table if selected) which will randomly be TRUE for 50% of the records and FALSE for the other 50%. A Filter could then be used to select records where this field = TRUE as training data and records where this field = FALSE as test data. The user has the option to specify the percent of data that will be randomly TRUE -- it does not have to be 50%. For larger datasets, a smaller percentage of TRUE values may be appropriate.

10 Characteristics

TRUE 10% of the time

Testing Significance of Analysis Characteristics

This will create ten new fields (random1, random2, ... , random10) which will each be randomly TRUE for 10% of your records and FALSE for the other 90%. Running an analysis including these characteristics as Generic Characteristics allows the user to see if the other characteristics that are thought to be predictive have higher significance than the random characteristics. If an analysis shows that a random characteristic has higher significance than one of the characteristics that is thought to be predictive, that characteristic's inclusion in the model should be investigated. If an analysis shows a characteristic to be less significant than ten random characteristics, it is analogous to a 90% confidence level that the characteristic is statistically insignificant.

 

To create random characteristics, select Tools -> Add Random Characteristics.

 

First choose the number of random characteristics to add:

 

 

Next choose the fraction of the records to flag as TRUE. Note that this is a fraction -- the 0.1 below is 10% (not 0.1%).

 

 

Finally choose the naming prefix of each new characteristic. Note that the actual name will be the naming prefix with a number appended on the right to differentiate between the characteristics (i.e. random1, random2, ...). If a column with that name already exists, the pre-existing column will be overwritten with the new random characteristic column.

 

 

The selections in these screen shots would create 10 random variables named random1, random2, ... , random10, with each being TRUE 10% of the time and FALSE the other 90% of the time.

 

After adding the random characteristics, they will be available along with the rest of the fields in the Training Data when selecting characteristics in the Column Selection window: