Often there is a need to introduce classification costs into the classifier for predicting disease. This is determined by the type of disease, its associated classification cost matrix and/or the target population on which the classifier will be used. Diabetes has higher costs associated with false negatives than true positives, as the disease can progress very rapidly when left untreated. There are two ways to skew a classifier to work towards the given classification cost matrix: (1) by changing the classification probability value, P* based on the classification cost matrix or (2) by rebalancing the training set to introduce more negative cases. Using a diabetes data set, this paper compares the two methods. The results indicate comparable values of predictive accuracy and expected classification costs for either method. However, P* works better when the p-value is less than 0.2. Hence for diabetes classification matrices, the P* method is recommended.
Available at: http://works.bepress.com/biswadip_ghosh/3/