Skip to main content
Unpublished Paper
A Unified Approach for Schema Matching, Coreference and Canonicalization
(2008)
  • Michael Wick
  • Khashayar Rohanimanesh
  • Karl Schultz
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
The automatic consolidation of database records from many heterogeneous sources into a single repository requires solving several information integration tasks. Although tasks such as coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach. Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50\% error reduction for coreference and a 40% error reduction for schema matching.
Keywords
  • Data Integration,
  • Coreference,
  • Schema Matching,
  • Canonicalization,
  • Conditional Random Field,
  • Weighted Logic,
  • Database Management,
  • Information Systems
Disciplines
Publication Date
2008
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Michael Wick, Khashayar Rohanimanesh, Karl Schultz and Andrew McCallum. "A Unified Approach for Schema Matching, Coreference and Canonicalization" (2008)
Available at: http://works.bepress.com/andrew_mccallum/91/