
Presentation
Minimal Test Collections for Retrieval Evaluation
29th Annual International ACM SIGIR Conference
(2006)
Abstract
Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of real-world implementations is at best inefficient, at worst infeasible. In this work we link evaluation with test collection construction to gain an understanding of the minimal judging effort that must be done to have high confidence in the outcome of an evaluation. A new way of looking at average precision leads to a natural algorithm for selecting documents to judge and allows us to estimate the degree of confidence by defining a distribution over possible document judgments. A study with annotators shows that this method can be used by a small group of researchers to rank a set of systems in under three hours with 95% confidence.
Keywords
- information retrieval,
- evaluation,
- test collections,
- algorithms,
- theory
Disciplines
Publication Date
2006
Citation Information
Ben Carterette, James Allan and Ramesh Sitaraman. "Minimal Test Collections for Retrieval Evaluation" 29th Annual International ACM SIGIR Conference (2006) Available at: http://works.bepress.com/ramesh_sitaraman/14/