Heaton, Grant, and Matthews (1991) presented an alternate scoring system for the Halstead–Reitan Neuropsychological Battery (HRNB) using age-, education-, and gender-corrected T scores to replace the raw score system traditionally employed with the test. This study addressed the impact of using this system on the score patterns generated by the HRNB in 64 clients with localized injuries to one of four quadrants of the brain. Results showed that the raw scores show more overall impairment than do the T scores, but overall the systems are statistically equal in their ability to separate the localized groups. The patterns of differences among the systems also differ. Although these findings suggest overall equivalence of the systems, they also show that each may have strengths that will show up in different analyses. A process of using both scoring systems for their own unique information is advocated as most appropriate at the current time until additional research addresses the questions raised in this study and elsewhere.
The Halstead–Reitan Neuropsychological Battery (HRNB) has long been recognized as one of the major influences in modern American neuropsychology(Golden, Zillmer, & Spiers, 1992; Reitan, 1955). Traditional scoring of the test has been based on the raw scores from a combination of central tests developed by Halstead (Category Test, Tactual Performance Test [TPT], Speech Sounds, Seashore Rhythm Test, Finger Oscillation Test [FOT], Impairment Index) combined with additional tests added by Reitan (Trail Making, Sensory-Perceptual Examination, Wechsler Adult Intelligence Scale, Reitan–Indiana Aphasia Screening Test;Golden, Osmon, Moses, & Berg, 1980).
Interpretation of the raw test scores of the HRNB is based on several principles. The most basic is level of performance, which employs cutoff scores designated by Halstead and Reitan to indicate normal or impaired performance(Reitan & Wolfson, 1993). Other techniques include the identification of differences between the right and left sides of the body, differences in patterns of scores that discriminate between the right and left hemisphere or between anterior and posterior brain areas, and the identification of pathognomonic signs (symptoms rarely seen in normal individuals;Reitan & Wolfson, 1993).
Since the test came into widespread use, however, there has been dissatisfaction with the scoring system employed, resulting in several attempts to better quantify the results of the test. The earliest major alternative was the system developed by Russell, Neuringer, and Goldstein (1970), which rated scores on a scale ranging from 0 (normal) to 4 (severely impaired) using cutoffs similar to (but not identical to) Reitan’s cutoffs. This system generated a new measure called the Average Impairment Index, which supplemented the Halstead Impairment Index. This system allowed for a more detailed comparison of which scores were better or worse than other scores among the HRNB subtests, but it yielded very similar results to Reitan’s original approach.
Others have speculated on the impact of age and education on the scoring of the battery(Golden et al., 1992; Reitan & Wolfson, 1995, 1996). Some local facilities have developed norms for use with specific lower education populations, whereas others have suggested norms based on age, education, and gender (e.g.,Heaton, Grant, & Matthews, 1986). There are indeed many revisions of Reitan’s basic scoring around the country, although the authors know of no central compilation of these norms or attempts to compare their relative usefulness and efficacy.
Because of increasing interest regarding age, education, and gender corrections, Heaton, Grant, and Matthews (1991) introduced a new scoring system for the test based on transforming the scores into normalized scale scores. These scores are further transformed into scale scores with a mean of 10 (SD = 3). These scale scores are then transformed into normalized T scores based on age, education, and gender. The authors argued in their introduction that such a system allows for a better comparison of the person’s scores with those expected for that person’s demographics, improves the comparison of scores across tests, and facilitates the identification of group weaknesses and strengths. These scores are very much in the tradition of age-based norms on intelligence tests (e.g.,Wechsler, 1955).
Despite this tradition, the Heaton norms, as they have come to be called, have generated extensive controversy in the field, with some welcoming the use of age- and education-corrected scores (e.g.,Scott, Tremont, & Hoffman, 1997) whereas others (e.g.,Reitan & Wolfson, 1995, 1996) have not. The argument has been made that there are many pitfalls in such a system.Reitan and Wolfson (1995, 1996) argued that age and education corrections, although important for normals, are much less important in brain-injured clients whose scores are much less correlated with age and education. Thus, the use of age and education corrections derived from samples of normal controls may overstate the corrections necessary for these factors and make it more difficult to discern the performance patterns specific to brain injury.
Scott et al. (1997) argued that Reitan’s arguments are incorrect and that the failure to find age and education correlates in brain injury is simply due to a restriction of the range of scores in the brain-injured group.Vanderploeg, Axelrod, Sherer, and Scott (1997) found similar effects for age and education in brain-damaged and normal participants, whereasShuttleworth-Jordan (1997) presented a strong theoretical argument against Reitan and Wolfson’s concerns in this area.
A second issue is that the conversion of raw scores into T scores prevents the neuropsychologist from directly comparing scores based on Reitan’s methods of pattern analysis. For example, Reitan and Wolfson (1993) argued that there should be certain relationships between performance on the left and right side of the bodies. These are calculated by comparing the ratio of the left-sided body performance to the right-sided performance. Such ratios cannot be calculated using T scores or scaled scores, and conversion to normalized scale scores may distort these relationships by grouping together wide ranges of performance into a single score.
A simple example of this is a 50-year-old man with a 10th-grade education who has a score of 45 on the right hand of the FOT and 36 on the left hand. This represents a 20% difference, a significant difference at the raw score levels. In the Heaton et al. system, these scores become scale scores of 8 and 7, respectively (Appendix C in Heaton et al., 1991). These scale scores become age-, gender-, and education-corrected T scores of 42 and 37, respectively. It is unclear if this represents a similarly significant difference.
Thus, although it is clearly possible to compare T scores by subtracting one from another, it is unclear what differences in T scores will correspond to Reitan’s cutoffs and whether such differences are consistent across age groups. Similarly, it is unclear whether the comparison of certain tests (e.g., Trails A to Trails B, or Speech Sounds to Rhythm) occurs in the same manner or with the same results using the Heaton system. Using traditional approaches to these differences (such as 10% differences between right and left body sides) is clearly inappropriate. The failure to have agreed on T-score differences appears to limit the use of the system.
In addition, age and education corrections may diminish the effectiveness of the test by correcting for problems that arise from brain dysfunction. Thus, if we correct for cognitive changes arising from aging, we may be correcting for the effects of brain changes. A low education may reflect a failure of the individual to complete education due to neuropsychologically based learning problems rather than the lack of educational opportunity. These corrections could make the individual appear more normal and lead to possible misdiagnosis, especially in cases with more subtle dysfunction.
Normalizing of scores may have the effect of decreasing real differences among brain-injured clients by assigning them the same or nearly same scale scores. Although such differences are unimportant in discriminating brain-injured participants from normals, they may be essential in finely discriminating different types of brain injury. For example, on the dominant hand of the TPT, a score of 3 min per block is given the same scaled score and T score as a score of 9 min per block, even though the latter performance is three times worse. A patient improving from 10 min per block on the dominant hand to 4 min per block on the nondominant hand (a significant improvement using Reitan’s rules) gets a scaled score of 2 for both, suggesting no improvement. This clumping of scores arises from forcing an essentially skewed distribution for brain-injured clients into a normalized distribution based on the performance of normals (which is also often skewed prior to normalization).
Finally, the adoption of a scoring system that makes such systematic changes may change the meaning of previous research and clinical experience on which the test’s clinical usefulness is based. Although a new system can be learned and investigated, the use of such a system should add to our ability to use the test. For a new system to be considered as a reasonable alternative, it must improve accuracy and effectiveness.
These issues are addressed in part by studies comparing the effectiveness of these alternate scoring systems. The present study is an attempt to examine the effectiveness of three scoring systems (raw scores, normalized scale scores, demographics-adjusted normalized T scores) in discriminating among four groups of patients with localized injuries, separated by laterality and anterior-posterior differences. It is believed that such a study is more important than a study that examines differences between normals and brain-injured clients, because the advantage of the HRNB lies in its ability to make discriminations among clients with different injuries or localized disorders. In addition, such a procedure represents a much finer level of interpretation that would be more sensitive to differences between the systems. This study is not necessarily intended to determine whether one system is better than the others but rather to determine the degree to which the use of these systems may lead to different findings and therefore necessitate different approaches to test interpretation.
Available at: http://works.bepress.com/charles-golden/150/