"Combining Shallow and Linguistically Motivated Features in Native Language Identification" by Serhiy Bykh

Selected Works of Sowmya Vajjala

Follow Contact

Presentation

Combining Shallow and Linguistically Motivated Features in Native Language Identification

Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (2013)

Serhiy Bykh, Universitat Tubingen
Sowmya Vajjala, Universitat Tubingen
Julia Krivanek, Universitat Tubingen
Detmar Meurers, Universität Tübingen

Download

Abstract

We explore a range of features and ensembles for the task of Native Language Identification as part of the NLI Shared Task (Tetreault et al., 2013). Starting with recurring word-based ngrams (Bykh and Meurers, 2012), we tested different linguistic abstractions such as partof-speech, dependencies, and syntactic trees as features for NLI. We also experimented with features encoding morphological properties, the nature of the realizations of particular lemmas, and several measures of complexity developed for proficiency and readability classification (Vajjala and Meurers, 2012). Employing an ensemble classifier incorporating all of our features we achieved an accuracy of 82.2% (rank 5) in the closed task and 83.5% (rank 1) in the open-2 task. In the open-1 task, the word-based recurring ngrams outperformed the ensemble, yielding 38.5% (rank 2). Overall, across all three tasks, our best accuracy of 83.5% for the standard TOEFL11 test set came in second place

Disciplines

Publication Date

June, 2013

Location

Atlanta, GA

Comments

Citation Information

Serhiy Bykh, Sowmya Vajjala, Julia Krivanek and Detmar Meurers. "Combining Shallow and Linguistically Motivated Features in Native Language Identification" Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (2013)
Available at: http://works.bepress.com/sowmya-vajjala/11/