BACKGROUND: Predicting health care costs for individuals and populations is essential for managing care. However, the comparative power of diagnostic and drug data for predicting future costs has not been closely examined.
OBJECTIVE: We sought to compare the predictive performance of claims-based models using diagnoses, drugs claims, and combined data to predict health care costs.
SUBJECTS: More than 1 million commercially insured, nonelderly individuals in a national (MEDSTAT MarketScan) research database comprised our sample.
MEASURES: We used 1997 and 1998 drug and diagnostic profiles to predict costs in 1998 and 1999, respectively. To assess model performance, we compared R2 values and predictive ratios (predicted costs/actual costs) for important subgroups.
RESULTS: Models using both drug and diagnostic data best predicted subsequent-year total health care costs (highest R2 = 0.168 versus 0.116 and 0.146 for models based on drug or diagnostic data alone, respectively), with highly accurate predictive ratios (0.95-1.05) for subgroups of patients with major medical conditions. Models predicting pharmacy costs had substantially higher R2 values than models predicting other medical costs (highest R2 0.493 versus 0.124). Drug-based models predicted future pharmacy costs better than diagnosis-based models (highest R2 = 0.482 versus 0.243), whereas diagnosis-based models predicted total costs (highest R2 = 0.146 versus 0.116) and nonpharmacy costs (highest R2 = 0.116 versus 0.071) more effectively than drug-based models. Newer models had markedly higher R values than older ones, largely because of richer data rather than model refinements.
CONCLUSIONS: Combined drug and diagnostic data predicts total health care costs better than either type of data alone. Pharmacy spending is particularly predictable from drug data, whereas diagnoses are more useful than drugs for predicting other medical costs and total costs. Using even slightly more recent data can substantially boost model performance measures; thus, model comparisons should be conducted on the same dataset.
Available at: http://works.bepress.com/arlene_ash/121/