"Automatic CEFR Level Prediction for Estonian Learner Text" by Sowmya Vajjala

Selected Works of Sowmya Vajjala

Follow Contact

Presentation

Automatic CEFR Level Prediction for Estonian Learner Text

Proceedings of the third workshop on NLP for computer-assisted language learning (2014)

Sowmya Vajjala, University of Tubingen
Kaidi Loo, University of Alberta

Download

Abstract

This paper reports on approaches for automatically predicting a learner’s language proficiency in Estonian according to the European CEFR scale. We used the morphological and POS tag information extracted from the texts written by learners. We compared classification and regression modeling for this task. Our models achieve a classification accuracy of 79% and a correlation of 0.85 when modeled as regression. After a comparison between them, we concluded that classification is more effective than regression in terms of exact error and the direction of error. Apart from this, we investigated the most predictive features for both multiclass and binary classification between groups and also explored the nature of the correlations between highly predictive features. Our results show considerable improvement in classification accuracy over previously reported results and take us a step closer towards the automated assessment of Estonian learner text

Keywords

Estonian,
Proficiency Classification,
CEFR,
Morphological Features,
Machine Learning

Disciplines

Publication Date

2014

Comments

Sowmya Vajjala and Kaidi Lõo 2014. Automatic CEFR level prediction for Estonian learner text. Proceedings of the third workshop on NLP for computer-assisted language learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings 107: 113–127.

Citation Information

Sowmya Vajjala and Kaidi Loo. "Automatic CEFR Level Prediction for Estonian Learner Text" Proceedings of the third workshop on NLP for computer-assisted language learning (2014)
Available at: http://works.bepress.com/sowmya-vajjala/1/

Creative Commons License

This work is licensed under a Creative Commons CC_BY-NC International License.