Skip to main content
Article
Neural embedding-based specificity metrics for pre-retrieval query performance prediction
Information Processing and Management
  • Ebrahim Bagheri, Ryerson University
  • Negar Arabzadeh, Ryerson University
  • Fattane Zarrinkalam, Ryerson University
  • Jelena Jovanovic, University of Belgrade
  • Feras Al-Obeidat, Zayed University
Document Type
Article
Publication Date
7-1-2020
Abstract

© 2020 Elsevier Ltd In information retrieval, the task of query performance prediction (QPP) is concerned with determining in advance the performance of a given query within the context of a retrieval model. QPP has an important role in ensuring proper handling of queries with varying levels of difficulty. Based on the extant literature, query specificity is an important indicator of query performance and is typically estimated using corpus-specific frequency-based specificity metrics However, such metrics do not consider term semantics and inter-term associations. Our work presented in this paper distinguishes itself by proposing a host of corpus-independent specificity metrics that are based on pre-trained neural embeddings and leverage geometric relations between terms in the embedding space in order to capture the semantics of terms and their interdependencies. Specifically, we propose three classes of specificity metrics based on pre-trained neural embeddings: neighborhood-based, graph-based, and cluster-based metrics. Through two extensive and complementary sets of experiments, we show that the proposed specificity metrics (1) are suitable specificity indicators, based on the gold standards derived from knowledge hierarchies (Wikipedia category hierarchy and DMOZ taxonomy), and (2) have better or competitive performance compared to the state of the art QPP metrics, based on both TREC ad hoc collections namely Robust’04, Gov2 and ClueWeb’09 and ANTIQUE question answering collection. The proposed graph-based specificity metrics, especially those that capture a larger number of inter-term associations, proved to be the most effective in both query specificity estimation and QPP. We have also publicly released two test collections (i.e. specificity gold standards) that we built from the Wikipedia and DMOZ knowledge hierarchies.

Publisher
Elsevier Ltd
Disciplines
Keywords
  • Ad hoc retrieval,
  • Neural embeddings,
  • Performance prediction
Scopus ID
85082826883
Indexed in Scopus
Yes
Open Access
No
https://doi.org/10.1016/j.ipm.2020.102248
Citation Information
Ebrahim Bagheri, Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, et al.. "Neural embedding-based specificity metrics for pre-retrieval query performance prediction" Information Processing and Management Vol. 57 Iss. 4 (2020) p. 102248 ISSN: <a href="https://v2.sherpa.ac.uk/id/publication/issn/0306-4573" target="_blank">0306-4573</a>
Available at: http://works.bepress.com/feras-al-obeidat/39/