Skip to main content
Unpublished Paper
An Exploration of Entity Models, Collective Classification and Relation Description
  • Hema Raghavan
  • James Allan
  • Andrew McCallum, University of Massachusetts - Amherst
Traditional information retrieval typically represents data using a bag of words; data mining typically uses a highly structured database ontology. This paper explores the a middle ground we term entity models, in which questions about structured data may be posed and answered, but the complexities and task-specific restrictions of ontologies are avoided. An entity model is a language model or word distribution associated with an entity, such as a person, place or organization. Using these per-entity language models, entities may be clustered, links may be detected or described with a short summary, entities may be collectively classified, and question answering may be performed. On a corpus of entities extracted from newswire and the Web, we group entities by profession with 90% accuracy, improve accuracy further on the task of classifying politicians as liberal or conservative using collective classification and conditional random fields, and answer questions about "who a person is" with mean reciprocal rank (MRR) of 0.52.
Publication Date
This is the pre-published version harvested from CIIR.
Citation Information
Hema Raghavan, James Allan and Andrew McCallum. "An Exploration of Entity Models, Collective Classification and Relation Description" (2004)
Available at: