After an analysis has completed, this tool can be used to simulate outcomes for a specific dataset based on the model. This tool can be used to get an idea of the variability and distribution of the Target variable given the various record characteristic values.
In order to use an analysis to run a simulation, both a Detail Key and Test Data must be used in the analysis. The completed analysis to be used needs to be open before selecting Tools -> Simulation from the menu bar to begin the simulation process. The Simulation dialog box below will appear.
·Table Containing Data For Simulation: The table containing the simulation set. This table must be in the same database as the analysis being used. A filter (Where Clause) may be applied to exclude certain records from the analysis.
·SQL Output Table Name: This is the name of the table MultiRate will create in SQL containing the simulation results.
·Number of Paths: The number of distinct simulations run for each record in the simulation set. For example, if the simulation set includes 100 records, one path will simulate one Target value for each record. The second path will simulate another 100 values, one for each record, and so on. Each path is independent of all other paths.
·Include Curve Modification: Check this box to include Curve Modification in the simulation.
When the simulation has finished, the simulated Target value for each record and path will be written out to a Simulation Table with the name specified.
The first step in the simulation is to apply the model factors to the simulation set. The same process that is used in the Apply Factors to Other Data tool is used here. This provides the expected values (means) for each simulated value.
Next the Test Data is used to create lists of modifying factors for each possible value for each characteristic. These lists are then randomized1 and applied to modeled Target values of the simulation set.
In more detail, when an analysis is run with Test Data, the modeled factors are applied to the Test Data. This information, along with the specific value and factor for each characteristic, are stored in the Test Detail Table. From this table, the Actual Target is divided by the Model Target (or the Modified Target if "Include Curve Modification" is selected) for each record. This "modifier" indicates how far off the model is for each particular record.
2Note: When the Actual Target is 0, a modifier of 1 is assigned and a "Probability of Zero" for that modifier is calculated. See footnote 2 below for further discussion.
In order to apply variability based on record characteristic values, the error above needs to be decomposed into various characteristic components.
The algorithm determines how much of the error each characteristic is responsible for. This is done by taking the absolute value of the natural log of each characteristic factor used, summing these values, and dividing each value by the sum to assign a "decomposition exponent" to each characteristic. So for each characteristic, i, this decomposition exponent is calculated using the following formula:
The ratio inside the square root reflects the relative impact of each characteristic to each other (i.e. higher characteristic factors are given a higher amount of error in the decomposition). The square root reflects the reshuffling of these factors that will generally occur across a typical dataset. This assumes independence of characteristics. If this adjustment were not included, independent combinations of these characteristics would fail to capture the volatility observed, except when all characteristics are fully correlated.
Once these decomposition exponents are calculated, the original modifier is raised to this power for each characteristic to calculate a "decomposed modifier". So for each characteristic, i, the decomposed modifier is calculated using the following formula:
Once this has been done for every record in the Test Data, there will be some number of decomposed modifiers (from here on, just referred to as "modifiers") for each characteristic and value combination. These are then grouped into lists and their order is randomized1.
Finally, these modifiers are applied to the modeled targets from the simulation set to imitate the variability observed in the Test Data. This is done by first randomizing1 the order of the simulation set, then going through it record-by-record, and applying a modifier from the appropriate characteristic and value list based on the value of each characteristic for the given record. The modifier chosen from the list is determined by matching two randomly1 assigned indexes, the index of the simulation set and the index of the characteristic and value list. If there are more records in the simulation set than in any given characteristic and value list, the index used from the list starts over at 1. The same thing would be done for each characteristic. The modeled target would then be multiplied by each of these modifiers to arrive at a "Simulated Target". This process is completed for every record in the simulation set.
2Note: When one or more of the modifiers applied has a corresponding "Decomposed Zero Probability", all of these probabilities are summed, and a random number between 0 and 1 is generated to determine whether or not to set the "Simulated Target" equal to zero. See footnote 2 below for further discussion.
For example, if for the first record in the simulation set, "Characteristic1" took the value "A", the first modifier from the I "Charactersitic1 and A" list would be assigned. Then, if for the second record in the simulation set, "Characteristic1" took the value "B", the second modifier from the I "Charactersitic1 and B" list would be assigned. If for the 100th record of the simulation set, "Characteristic1" took the value "B", but the "Charactersitic1 and B" list only included 99 modifiers, the first modifier from the list would be assigned.
One path produces one "Simulated Target" for each record in the simulation set. Each path is independent of all other paths.
After the specified number of paths has finished running (or the user halts the analysis with only a portion of the paths run), the simulated target values for each unique record in the simulation set are rebalanced so that the average target value equals the expected value (mean). This is done by dividing the modeled target by the average "Simulated Target" value across all paths. This produces a factor, which is then multiplied by all of the "Simulated Target" values to arrive at the final "Balanced Simulated Target". The results of these calculations are stored in the Simulation Table. This accomplishes two things. First it accounts for the probability of a simulated zero record (avoids the necessity to gross-up all non-zero simulated values to their value conditional on being non-zero). Second, it ensures that regardless of the number of simulations being run, the first moment for each record is exactly as predicted given the model.
Technically, these algorithms generate pseudo-random numbers that are completely independent from the task being performed. These pseudo-random numbers are expected to be "sufficiently random" for this purpose. In this process, two different types of randomization are required:
1) Randomizing the order of a list
This is done by adding a "uniqueidentifier" column to the table in SQL, ordering by that column, then adding an "IDENTITY" column to give a randomly generated integer from 0 to (number of records - 1).
2) Generating a random decimal between 0 and 1
This is done by using the following formula, which takes the natural log of an integer determined by the row index and path, and uses the numbers starting with the 7th decimal place (the 100 is added simply to move the integer away from zero when the row index and path are small):
2Decomposed Zero Probability
Technically a zero result is not possible with a multiplicative model of non-zero factors such as the models generated by MultiRate. Therefore decomposing errors in a multiplicative fashion will not work to generate zero valued simulated paths.
The problem of zero valued targets is addressed as follows: First a modifier of 1 instead of 0 is assigned to any record with an actual target value of 0. Then for each of these records, a "decomposed zero probability" value for each characteristic is also calculated. This value determines the likelihood that each characteristic caused the actual target to be 0.
These values are calculated for each characteristic by taking the inverse of the characteristic factor for that characteristic and dividing by the sum of the characteristic factors for all characteristics. So for each characteristic, i, the decomposed zero probability is calculated using the following formula:
For example, a factor of 0.2 is much more likely to be the cause of the actual target being 0 than a factor of 2. Hypothetically, if these were the only two factors, the decomposed zero probability values would be:
Then, when one or more of the modifiers applied to a record in the simulation set have a decomposed zero probability value, all of these decomposed amounts are summed across characteristics. A random number is then generated between 0 and 1, and if the number generated is less than the sum of the decomposed zero probability values, the simulated target is set to 0. This results in a similar proportion of 0's in the simulated target as was found in the Test Data.