Skip to main content
Unpublished Paper
Modeling Score Distributions for Meta Search
(2002)
  • R. Manmatha, University of Massachusetts - Amherst
  • T. Rath
  • F. Feng
Abstract

In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be modeled using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for a wide variety of different search engines including INQUERY a probabilistic search engine, SMART a vector space engine, and search engines based on latent semantic indexing and language modeling. The model also works when search engines index other languages like Chinese.

It is then shown that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities. We also discuss how the shape of the score distributions arise given certain assumptions about word distributions in documents. We hypothesize that all good text search engines operating on any language have similar characteristics.

This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. Results show that the technique performs as well as the best current combination techniques.

Disciplines
Publication Date
2002
Comments
This is the pre-published version harvested from CIIR.
Citation Information
R. Manmatha, T. Rath and F. Feng. "Modeling Score Distributions for Meta Search" (2002)
Available at: http://works.bepress.com/r_manmatha/3/