"Collective Human Opinions in Semantic Textual Similarity" by Yuxia Wang

Selected Works of Timothy Baldwin

Article

Collective Human Opinions in Semantic Textual Similarity

Transactions of the Association for Computational Linguistics

Yuxia Wang, University of Melbourne
Shimin Tao, Huawei TSC
Ning Xie, Huawei TSC
Hao Yang, Huawei TSC
Timothy Baldwin, Mohamed Bin Zayed University of Artificial Intelligence
Karin Verspoor, RMIT University

Download

Document Type

Article

Abstract

Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ∼15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgments adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.

DOI

10.1162/tacl_a_00584

Publication Date

1-1-2023

Comments

Open Access

Archived thanks to MIT Press Direct

License: CC by 4.0

Uploaded: 22 March 2024

Additional Links

DOI link: https://doi.org/10.1162/tacl_a_00584

Citation Information

Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, et al.. "Collective Human Opinions in Semantic Textual Similarity" Transactions of the Association for Computational Linguistics Vol. 11 (2023) p. 997 - 1013
Available at: http://works.bepress.com/timothy-baldwin/31/