Curriculum-based measurement of oral reading (R-CBM): A diagnostic test accuracy meta-analysis of evidence supporting use in universal screening
Journal of School Psychology (2014)
  • Stephen P. Kilgus, East Carolina University
  • Scott A. Methe, University of Massachusetts Boston
  • Daniel M. Maggin, University of Illinois at Chicago
  • Jessica L. Tomasula, East Carolina University

A great deal of research over the past decade has examined the appropriateness of curriculum-based measurement of oral reading (R-CBM) in universal screening. Multiple researchers have meta-analyzed available correlational evidence, yielding support for the interpretation of R-CBM as an indicator of general reading proficiency. In contrast, researchers have yet to synthesize diagnostic accuracy evidence, which pertains to the defensibility of the use of R-CBM for screening purposes. The overall purpose of this research was to therefore conduct the first meta-analysis of R-CBM diagnostic accuracy research. A systematic search of the literature resulted in the identification of 34 studies, including 20 peer-reviewed articles, 7 dissertations, and 7 technical reports. Bivariate hierarchical linear models yielded generalized estimates of diagnostic accuracy statistics, which predominantly exceeded standards for acceptable universal screener performance. For instance, when predicting criterion outcomes within a school year (≤ 9 months), R-CBM sensitivity ranged between .80 and .83 and specificity ranged between .71 and .73. Multiple moderators of R-CBM diagnostic accuracy were identified, including the (a) R-CBM cut score used to define risk, (b) lag in time between R-CBM and criterion test administration, and (c) percentile rank corresponding to the criterion test cut score through which students were identified as either truly at risk or not at risk. Follow-up analyses revealed substantial variability of extracted cut scores within grade and time of year (i.e., fall, winter, and spring). This result called into question the inflexible application of a single cut score across contexts and suggested the potential necessity of local cut scores. Implications for practices, directions for future research, and limitations are discussed.

