Skip to main content
Unpublished Paper
Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes
(2007)
  • Pallika Kanani
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Entity resolution in the research paper domain is an important, but difficult problem. It suffers from insufficient contextual information, hence using information from the web significantly improves performance. We formulate the author coreference problem as one of graph partitioning with discriminatively-trained edge weights. Building on our previous work, we present improved and more comprehensive results for the method in which we incorporate web documents as additional nodes in the graph. We also propose efficient strategies to select a subset of nodes to add to the graph and to select a subset of queries to gather additional nodes, without significant loss of performance gain. We extend the classic Set-cover problem to develop a node selection criteria, hence opening up interesting theoretical possibilities. Finally, we propose a hybrid approach, that achieves 74.3% of the total improvement gain using only 18.3% of all additional mentions.
Disciplines
Publication Date
2007
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Pallika Kanani and Andrew McCallum. "Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes" (2007)
Available at: http://works.bepress.com/andrew_mccallum/103/