Negation is a common linguistic feature that is crucial in many language understanding tasks, yet it remains a hard problem due to diversity in its expression in different types of text. Recent work has shown that state-of-the-art NLP models underperform on samples containing negation in various tasks, and that negation detection models do not transfer well across domains. We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking, to better incorporate negation information into language models. Extensive experiments on common benchmarks show that our proposed approach improves negation detection performance and generalizability over the strong baseline NegBERT (Khandelwal and Sawant, 2020). © 2022, CC BY.
- Computational linguistics,
- Data augmentation,
- Detection models,
- Detection performance,
- Hard problems,
- Language model,
- Language understanding,
- Linguistic features,
- Pre-training,
- State of the art,
- Training strategy,
- Benchmarking,
- Computation and Language (cs.CL)
Preprint: arXiv
Archived with thanks to arXiv
Preprint License: CC by 4.0
Uploaded 01 July 2022