Articles «Previous Next»

R and S-PLUS produced different classification trees for predicting patient mortality

Peter C. Austin, Institute for Clinical Evaluative Sciences

Abstract

Objective There is a growing interest in using classification and regression trees in biomedical research. R and S-PLUS are two statistical programming languages that share a similar syntax and functionality. Both R and S-PLUS allow users to fit classification and regression trees. The objective was to compare classification trees grown using R with those grown using S-PLUS.

Study Design and Setting Using data on 9,484 patients hospitalized with an acute myocardial infarction, we compared the classification trees for predicting mortality that were grown using R and S-PLUS. We also used repeated split-sample derivation to determine the predictive accuracy of classification trees grown using R and S-PLUS.

Results The classification tree grown using R was substantially more parsimonious than the one grown using S-PLUS. The pruned classification tree grown using R was equal to a classification tree that was obtained by removing six subtrees from the pruned classification tree grown using S-PLUS. Repeated split-sample validation was then used to demonstrate that classification trees constructed using S-PLUS had greater discrimination and accuracy compared to classification trees grown using R.

Conclusions R can produce different classification trees than S-PLUS using the same data.

Suggested Citation

Peter C. Austin. "R and S-PLUS produced different classification trees for predicting patient mortality" Journal of Clinical Epidemiology 61 (2008): 1222-1226.