"Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector" by Nacer Eddine Benzebouchi

Selected Works of Monther Aldwairi

Follow Contact

Article

Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector

16th International Multi-Conference on Systems, Signals and Devices, SSD 2019

Nacer Eddine Benzebouchi, Université Badji Mokhtar - Annaba
Nabiha Azizi, Université Badji Mokhtar - Annaba
Nacer Eddine Hammami, Jouf University
Didier Schwab, Universite Grenoble Alpes
Mohammed Chiheb Eddine Khelaifia, Université Badji Mokhtar - Annaba
Monther Aldwairi, Zayed University

Link

Document Type

Conference Proceeding

Publication Date

3-1-2019

Abstract

© 2019 IEEE. Text mining is one of the main and typical tasks of machine learning (ML). Authorship identification (AI) is a standard research subject in text mining and natural language processing (NLP) that has undergone a remarkable evolution these last years. We need to identify/determine the actual author of anonymous texts given on the basis of a set of writing samples. Standard text classification often focuses on many handcrafted features such as dictionaries, knowledge bases, and different stylometric characteristics, which often leads to remarkable dimensionality. Unlike traditional approaches, this paper suggests an authorship identification approach based on automatic feature engineering using word2vec word embeddings, taking into account each author's writing style. This system includes two learning phases, the first stage aims to generate the semantic representation of each author by using word2vec to learn and extract the most relevant characteristics of the raw document. The second stage is to apply the multilayer perceptron (MLP) classifier to fix the classification rules using the backpropagation learning algorithm. Experiments show that MLP classifier with word2vec model earns an accuracy of 95.83% for an English corpus, suggesting that the word2vec word embedding model can evidently enhance the identification accuracy compared to other classical models such as n-gram frequencies and bag of words.

DOI Link

10.1109/ssd.2019.8894872

ISBN

9781728118208

Publisher

Institute of Electrical and Electronics Engineers Inc.

Disciplines

Keywords

Authorship Identification,
MLP classifier,
Natural Language Processing,
Text Mining,
Word2Vec

Scopus ID

85075632764

Indexed in Scopus

Yes

Open Access

https://doi.org/10.1109/SSD.2019.8894872

Citation Information

Nacer Eddine Benzebouchi, Nabiha Azizi, Nacer Eddine Hammami, Didier Schwab, et al.. "Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector" 16th International Multi-Conference on Systems, Signals and Devices, SSD 2019 (2019) p. 371 - 376
Available at: http://works.bepress.com/monther-aldwairi/33/