![](https://d3ilqtpdwi981i.cloudfront.net/RLqte8Pxi6otYXYW5jt2aZYdX5M=/425x550/smart/https://bepress-attached-resources.s3.amazonaws.com/uploads/3c/65/c6/3c65c66b-b9e9-434e-8268-a2ec603e61d3/thumbnail_5dd01d3d-d3e7-4d76-8f4a-e9355fd7f089.jpg)
The accuracy of four machine learning methods in predicting narrative macrostructure scores was compared to scores obtained by human raters utilizing a criterion-referenced progress monitoring rubric. The machine learning methods that were explored covered methods that utilized hand-engineered features, as well as those that learn directly from the raw text. The predictive models were trained on a corpus of 414 narratives from a normative sample of school-aged children (5;0-9;11) who were given a standardized measure of narrative proficiency. Performance was measured using Quadratic Weighted Kappa, a metric of inter-rater reliability. The results indicated that one model, BERT, not only achieved significantly higher scoring accuracy than the other methods, but was consistent with scores obtained by human raters using a valid and reliable rubric. The findings from this study suggest that a machine learning method, specifically, BERT, shows promise as a way to automate the scoring of narrative macrostructure for potential use in clinical practice.