"MultiSpanQA: A Dataset for Multi-Span Question Answering" by Haonan Li

Selected Works of Timothy Baldwin

Article

MultiSpanQA: A Dataset for Multi-Span Question Answering

Natural Language Processing Faculty Publications

Haonan Li, School of Computing and Information Systems, The University of Melbourne, Australia
Maria Vasardani, Department of Geospatial Science, RMIT University, Australia
Martin Tomko, Department of Infrastructure Engineering, The University of Melbourne, Australia
Timothy Baldwin, School of Computing and Information Systems, The University of Melbourne, Australia & Mohamed bin Zayed University of Artificial Intelligence

Link

Document Type

Conference Proceeding

Abstract

Most existing reading comprehension datasets focus on single-span answers, which can be extracted as a single contiguous span from a given text passage. Multi-span questions, i.e., questions whose answer is a series of multiple discontiguous spans in the text, are common in real life but are less studied. In this paper, we present MultiSpanQA, a new dataset that focuses on questions with multi-span answers. Raw questions and contexts are extracted from the Natural Questions (Kwiatkowski et al., 2019) dataset. After multi-span re-annotation, MultiSpanQA consists of over a total of 6,000 multi-span questions in the basic version, and over 19,000 examples with unanswerable questions, and questions with single-, and multi-span answers in the expanded version. We introduce new metrics for the purposes of multi-span question answering evaluation, and establish several baselines using advanced models. Finally, we propose a new model which beats all baselines and achieves the state-of-the-art on our dataset. © 2022 Association for Computational Linguistics.

DOI

10.18653/v1/2022.naacl-main.90

Publication Date

7-1-2022

Keywords

Advanced modeling,
Multi-spans,
Question Answering,
Question-answering evaluation,
Reading comprehension,
State of the art

Disciplines

Comments

IR Deposit conditions: non-described

Citation Information

H. Li, M. Tomko, M. Vasardani, and T. Baldwin, "MultiSpanQA: A Dataset for Multi-Span Question Answering", in Proceedings of the 2022 Conf. of the North American Chapter of the Assoc. for Computational Linguistics: Human Language Technologies (NAACL 2022), July 2022, pp. 1250-1260, doi: 10.18653/v1/2022.naacl-main.90