Skip to main content
Article
Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector
16th International Multi-Conference on Systems, Signals and Devices, SSD 2019
  • Nacer Eddine Benzebouchi, Université Badji Mokhtar - Annaba
  • Nabiha Azizi, Université Badji Mokhtar - Annaba
  • Nacer Eddine Hammami, Jouf University
  • Didier Schwab, Universite Grenoble Alpes
  • Mohammed Chiheb Eddine Khelaifia, Université Badji Mokhtar - Annaba
  • Monther Aldwairi, Zayed University
Document Type
Conference Proceeding
Publication Date
3-1-2019
Abstract

© 2019 IEEE. Text mining is one of the main and typical tasks of machine learning (ML). Authorship identification (AI) is a standard research subject in text mining and natural language processing (NLP) that has undergone a remarkable evolution these last years. We need to identify/determine the actual author of anonymous texts given on the basis of a set of writing samples. Standard text classification often focuses on many handcrafted features such as dictionaries, knowledge bases, and different stylometric characteristics, which often leads to remarkable dimensionality. Unlike traditional approaches, this paper suggests an authorship identification approach based on automatic feature engineering using word2vec word embeddings, taking into account each author's writing style. This system includes two learning phases, the first stage aims to generate the semantic representation of each author by using word2vec to learn and extract the most relevant characteristics of the raw document. The second stage is to apply the multilayer perceptron (MLP) classifier to fix the classification rules using the backpropagation learning algorithm. Experiments show that MLP classifier with word2vec model earns an accuracy of 95.83% for an English corpus, suggesting that the word2vec word embedding model can evidently enhance the identification accuracy compared to other classical models such as n-gram frequencies and bag of words.

ISBN
9781728118208
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
  • Authorship Identification,
  • MLP classifier,
  • Natural Language Processing,
  • Text Mining,
  • Word2Vec
Scopus ID
85075632764
Indexed in Scopus
Yes
Open Access
No
https://doi.org/10.1109/SSD.2019.8894872
Citation Information
Nacer Eddine Benzebouchi, Nabiha Azizi, Nacer Eddine Hammami, Didier Schwab, et al.. "Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector" 16th International Multi-Conference on Systems, Signals and Devices, SSD 2019 (2019) p. 371 - 376
Available at: http://works.bepress.com/monther-aldwairi/33/