Skip to main content
Article
Impact of document representation on neural ad hoc retrieval
International Conference on Information and Knowledge Management, Proceedings
  • Ebrahim Bagheri, Ryerson University
  • Faezeh Ensan, Ferdowsi University of Mashhad
  • Feras Al-Obeidat, Zayed University
Document Type
Conference Proceeding
Publication Date
10-17-2018
Abstract

© 2018 Association for Computing Machinery. Neural embeddings have been effectively integrated into information retrieval tasks including ad hoc retrieval. One of the benefits of neural embeddings is they allow for the calculation of the similarity between queries and documents through vector similarity calculation methods. While such methods have been effective for document matching, they have an inherent bias towards documents that are sized relatively similarly. Therefore, the difference between the query and document lengths, referred to as the query-document size imbalance problem, becomes an issue when incorporating neural embeddings and their associated similarity calculation models into the ad hoc document retrieval process. In this paper, we propose that document representation methods need to be used to address the size imbalance problem and empirically show their impact on the performance of neural embedding-based ad hoc retrieval. In addition, we explore several types of document representation methods and investigate their impact on the retrieval process. We conduct our experiments on three widely used standard corpora, namely Clueweb09B, Clueweb12B and Robust04 and their associated topics. Summarily, we find that document representation methods are able to effectively address the query-document size imbalance problem and significantly improve the performance of neural ad hoc retrieval. In addition, we find that a document representation method based on a simple term-frequency shows significantly better performance compared to more sophisticated representation methods such as neural composition and aspect-based methods.

ISBN
9781450360142
Publisher
Association for Computing Machinery
Disciplines
Keywords
  • Calculations,
  • Knowledge management,
  • Associated topics,
  • Document matching,
  • Document Representation,
  • Document Retrieval,
  • Imbalance problem,
  • Representation method,
  • Retrieval process,
  • Similarity calculation,
  • Information retrieval
Scopus ID
85058034898
Indexed in Scopus
Yes
Open Access
No
https://doi.org/10.1145/3269206.3269314
Citation Information
Ebrahim Bagheri, Faezeh Ensan and Feras Al-Obeidat. "Impact of document representation on neural ad hoc retrieval" International Conference on Information and Knowledge Management, Proceedings (2018) p. 1635 - 1638
Available at: http://works.bepress.com/feras-al-obeidat/35/