"Neural embedding-based specificity metrics for pre-retrieval query performance prediction" by Ebrahim Bagheri

Selected Works of Feras Al-Obeidat

Follow Contact

Article

Neural embedding-based specificity metrics for pre-retrieval query performance prediction

Information Processing and Management

Ebrahim Bagheri, Ryerson University
Negar Arabzadeh, Ryerson University
Fattane Zarrinkalam, Ryerson University
Jelena Jovanovic, University of Belgrade
Feras Al-Obeidat, Zayed University

Link

Document Type

Article

Publication Date

7-1-2020

Abstract

© 2020 Elsevier Ltd In information retrieval, the task of query performance prediction (QPP) is concerned with determining in advance the performance of a given query within the context of a retrieval model. QPP has an important role in ensuring proper handling of queries with varying levels of difficulty. Based on the extant literature, query specificity is an important indicator of query performance and is typically estimated using corpus-specific frequency-based specificity metrics However, such metrics do not consider term semantics and inter-term associations. Our work presented in this paper distinguishes itself by proposing a host of corpus-independent specificity metrics that are based on pre-trained neural embeddings and leverage geometric relations between terms in the embedding space in order to capture the semantics of terms and their interdependencies. Specifically, we propose three classes of specificity metrics based on pre-trained neural embeddings: neighborhood-based, graph-based, and cluster-based metrics. Through two extensive and complementary sets of experiments, we show that the proposed specificity metrics (1) are suitable specificity indicators, based on the gold standards derived from knowledge hierarchies (Wikipedia category hierarchy and DMOZ taxonomy), and (2) have better or competitive performance compared to the state of the art QPP metrics, based on both TREC ad hoc collections namely Robust’04, Gov2 and ClueWeb’09 and ANTIQUE question answering collection. The proposed graph-based specificity metrics, especially those that capture a larger number of inter-term associations, proved to be the most effective in both query specificity estimation and QPP. We have also publicly released two test collections (i.e. specificity gold standards) that we built from the Wikipedia and DMOZ knowledge hierarchies.

DOI Link

10.1016/j.ipm.2020.102248

Publisher

Elsevier Ltd

Disciplines

Computer Sciences

Keywords

Ad hoc retrieval,
Neural embeddings,
Performance prediction

Scopus ID

85082826883

Indexed in Scopus

Yes

Open Access

https://doi.org/10.1016/j.ipm.2020.102248

Citation Information

Ebrahim Bagheri, Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, et al.. "Neural embedding-based specificity metrics for pre-retrieval query performance prediction" Information Processing and Management Vol. 57 Iss. 4 (2020) p. 102248 ISSN: <a href="https://v2.sherpa.ac.uk/id/publication/issn/0306-4573" target="_blank">0306-4573</a>
Available at: http://works.bepress.com/feras-al-obeidat/39/