Skip to main content
Unpublished Paper
Improving Author Coreference by Resource-bounded Information Gathering from theWeb
(2007)
  • Pallika Kanani
  • Andrew McCallum, University of Massachusetts - Amherst
  • Chris Pal
Abstract
Accurate entity resolution is sometimes impossible simply due to insufficient information. For example, in research paper author name resolution, even clever use of venue, title and co-authorship relations are often not enough to make a confident coreference decision. This paper presents several methods for increasing accuracy by gathering and integrating additional evidence from the web. We formulate the coreference problem as one of graph partitioning with discriminatively-trained edge weights, and then incorporate web information either as additional features or as additional nodes in the graph. Since the web is too large to incorporate all its data, we need an efficient procedure for selecting a subset of web queries and data. We formally describe the problem of resource bounded information gathering in each of these contexts, and show significant accuracy improvement with low cost.
Disciplines
Publication Date
2007
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Pallika Kanani, Andrew McCallum and Chris Pal. "Improving Author Coreference by Resource-bounded Information Gathering from theWeb" (2007)
Available at: http://works.bepress.com/andrew_mccallum/119/