Skip to main content
Article
MultiSpanQA: A Dataset for Multi-Span Question Answering
Natural Language Processing Faculty Publications
  • Haonan Li, School of Computing and Information Systems, The University of Melbourne, Australia
  • Maria Vasardani, Department of Geospatial Science, RMIT University, Australia
  • Martin Tomko, Department of Infrastructure Engineering, The University of Melbourne, Australia
  • Timothy Baldwin, School of Computing and Information Systems, The University of Melbourne, Australia & Mohamed bin Zayed University of Artificial Intelligence
Document Type
Conference Proceeding
Abstract

Most existing reading comprehension datasets focus on single-span answers, which can be extracted as a single contiguous span from a given text passage. Multi-span questions, i.e., questions whose answer is a series of multiple discontiguous spans in the text, are common in real life but are less studied. In this paper, we present MultiSpanQA, a new dataset that focuses on questions with multi-span answers. Raw questions and contexts are extracted from the Natural Questions (Kwiatkowski et al., 2019) dataset. After multi-span re-annotation, MultiSpanQA consists of over a total of 6,000 multi-span questions in the basic version, and over 19,000 examples with unanswerable questions, and questions with single-, and multi-span answers in the expanded version. We introduce new metrics for the purposes of multi-span question answering evaluation, and establish several baselines using advanced models. Finally, we propose a new model which beats all baselines and achieves the state-of-the-art on our dataset. © 2022 Association for Computational Linguistics.

DOI
10.18653/v1/2022.naacl-main.90
Publication Date
7-1-2022
Keywords
  • Advanced modeling,
  • Multi-spans,
  • Question Answering,
  • Question-answering evaluation,
  • Reading comprehension,
  • State of the art
Comments

IR Deposit conditions: non-described

Citation Information
H. Li, M. Tomko, M. Vasardani, and T. Baldwin, "MultiSpanQA: A Dataset for Multi-Span Question Answering", in Proceedings of the 2022 Conf. of the North American Chapter of the Assoc. for Computational Linguistics: Human Language Technologies (NAACL 2022), July 2022, pp. 1250-1260, doi: 10.18653/v1/2022.naacl-main.90