Skip to main content
Presentation
Automatic CEFR Level Prediction for Estonian Learner Text
Proceedings of the third workshop on NLP for computer-assisted language learning (2014)
  • Sowmya Vajjala, University of Tubingen
  • Kaidi Loo, University of Alberta
Abstract
This paper reports on approaches for automatically predicting a learner’s language proficiency in Estonian according to the European CEFR scale. We used the morphological and POS tag information extracted from the texts written by learners. We compared classification and regression modeling for this task. Our models achieve a classification accuracy of 79% and a correlation of 0.85 when modeled as regression. After a comparison between them, we concluded that classification is more effective than regression in terms of exact error and the direction of error. Apart from this, we investigated the most predictive features for both multiclass and binary classification between groups and also explored the nature of the correlations between highly predictive features. Our results show considerable improvement in classification accuracy over previously reported results and take us a step closer towards the automated assessment of Estonian learner text
Keywords
  • Estonian,
  • Proficiency Classification,
  • CEFR,
  • Morphological Features,
  • Machine Learning
Publication Date
2014
Comments
Sowmya Vajjala and Kaidi Lõo 2014. Automatic CEFR level prediction for Estonian learner text. Proceedings of the third workshop on NLP for computer-assisted language learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings 107: 113–127. 
Citation Information
Sowmya Vajjala and Kaidi Loo. "Automatic CEFR Level Prediction for Estonian Learner Text" Proceedings of the third workshop on NLP for computer-assisted language learning (2014)
Available at: http://works.bepress.com/sowmya-vajjala/1/
Creative Commons License
Creative Commons License
This work is licensed under a Creative Commons CC_BY-NC International License.