Skip to main content
Article
Gun Violence News Information Retrieval using BERT as Sequence Tagging Task
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
  • Hung Yeh Lin, San Jose State University
  • Teng Sheng Moh, San Jose State University
  • Bryce Westlake, San Jose State University
Publication Date
1-1-2021
Document Type
Conference Proceeding
DOI
10.1109/BigData52589.2021.9671919
Abstract

The growth in both frequency and severity of gun violence in the United States has necessitated increased research into prevention, despite the lack of funding. Comprising more than 60k gun violence media articles with a total data size of 520 MB, the gun violence database (GVDB) was developed to assist natural language processing researchers in developing and testing prevention methods. Original research based on the GVDB utilized a span-selection model to extract shooter and victim information, but their works might potentially trim out important span candidates. We proposed a new approach to improve identification accuracy and recognize every token in a sentence using a sequence tagging technique. We implemented a BIO sequence tagging model at the token-level using BERT, then further classified each token using LSTM, BiLSTM, and CRF. We found that utilizing BERT as an embedding layer, and decoding word representation as a sequence tagging task, improved shooter/victim identification compared to a span-selection model. We believe that if this improved model is combined with gun violence related keywords, automated techniques could be implemented to identify precursors/risks to gun violence on social media, allowing for intervention by law enforcement or community agencies before escalation to deaths.

Keywords
  • BERT,
  • BiLSTM,
  • CRF,
  • gun violence,
  • natural language processing,
  • NLP,
  • sequence tagging,
  • transformer
Citation Information
Hung Yeh Lin, Teng Sheng Moh and Bryce Westlake. "Gun Violence News Information Retrieval using BERT as Sequence Tagging Task" Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 (2021) p. 2525 - 2531
Available at: http://works.bepress.com/bryce_westlake/56/