Fuzzy K

The Fuzzy K statistic is a generalized parameter count concept. It is a measure to help understand how the number of meaningful values for a given Characteristic, or the total amount of specification of an analysis, model in the aggregate. The concept is similar to fuzzy logic, in which variables are not limited to a truth value of 0 or 1, but can have a truth value that ranges in degree between 0 and 1.

For example, a factor may be developed for a particular characteristic value with very little exposure relative to the credibility standard. The factor, which will be very close to one, is technically a parameter but Fuzzy K will be close to zero for this factor, reflecting the relative insignificance of the factor to the overall model specification.

Range of possible values for Fuzzy K:

Data Type |
Minimum Fuzzy K |
Maximum Fuzzy K |

·Generic Characteristic ·Grouped ·Variable Gradient Smoothing |
0 |
Number of bins for the Characteristic (including the NULL bin) |

·Linear on Bin Number ·Linear on Average Value ·Linear on Log of Average Value |
0 |
1 |

Tip |
To avoid over fitting your data, try to reduce Fuzzy K, where such reductions do not have considerable adverse impact on Significance or RMSE. |

Calculation of Fuzzy K

Calculation for Generic Characteristic, Grouped, or Variable Gradient Smoothing

We first calculate the Fuzzy K value for each individual bin (which is a unique value for Generic Characteristic, or a set of values for a Grouped characteristic):

where E = Exposure, Z = Credibility Factor and G is the group.

If smoothing is being applied, the exposure for the group will be larger than the exposure for the individual bin alone. Calculating Fuzzy K in this way avoids double counting of shared data.

If smoothing is not applied, the group is the same as the bin and therefore Fuzzy K for the bin will simply be the credibility factor for the bin.

We can then calculate Fuzzy K for the characteristic in total:

This reflects the balancing that is performed across bins. Since the average of all the factors is forced to equal one, this essentially eliminates a parameter, in the case of full credibility. However since the total itself will not be fully credible Z_{total} is subtracted.

Calculation for Linear Models (Linear on Average, Linear on Bin Number, or Linear on Log of Average)

For these models, essentially one factor (a slope parameter) is being determined. The credibility associated with this single parameter is based on the total amount of exposure for the entire analysis.

where

For each individual bin within a linear characteristic, the credibility is included only for illustrative and consistency purposes and is calculated as:

reflecting the 2 parameters (slope and intercept) that are calculated prior to balancing.

Identical to the other models (variable gradient smoothing, no smoothing), the relationship between the the characteristic Fuzzy K and the Fuzzy K values for the individual bins (after balancing) is:

which simplifies to the basic formula mentioned first for Linear Models: