Skip to main content
Article
Stopword detection for streaming content
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • Hossein Fani, University of New Brunswick
  • Masoud Bashari, Ryerson University
  • Fattane Zarrinkalam, Ryerson University
  • Ebrahim Bagheri, Ryerson University
  • Feras Al-Obeidat, Zayed University
ORCID Identifiers

0000-0002-6033-6564

Document Type
Conference Proceeding
Publication Date
1-1-2018
Abstract

© Springer International Publishing AG, part of Springer Nature 2018. The removal of stopwords is an important preprocessing step in many natural language processing tasks, which can lead to enhanced performance and execution time. Many existing methods either rely on a predefined list of stopwords or compute word significance based on metrics such as tf-idf. The objective of our work in this paper is to identify stopwords, in an unsupervised way, for streaming textual corpora such as Twitter, which have a temporal nature. We propose to consider and model the dynamics of a word within the streaming corpus to identify the ones that are less likely to be informative or discriminative. Our work is based on the discrete wavelet transform (DWT) of word signals in order to extract two features, namely scale and energy. We show that our proposed approach is effective in identifying stopwords and improves the quality of topics in the task of topic detection.

ISBN
9783319769400
Publisher
Springer Verlag
Disciplines
Keywords
  • Discrete wavelet transforms,
  • Information retrieval,
  • Natural language processing systems,
  • Execution time,
  • Pre-processing step,
  • Topic detection,
  • Word signals,
  • Linguistics
Scopus ID
85044446453
Indexed in Scopus
Yes
Open Access
No
https://doi.org/10.1007/978-3-319-76941-7_70
Citation Information
Hossein Fani, Masoud Bashari, Fattane Zarrinkalam, Ebrahim Bagheri, et al.. "Stopword detection for streaming content" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10772 LNCS (2018) p. 737 - 743 ISSN: <a href="https://v2.sherpa.ac.uk/id/publication/issn/0302-9743" target="_blank">0302-9743</a>
Available at: http://works.bepress.com/feras-al-obeidat/46/