"A Unified Approach for Schema Matching, Coreference and Canonicalization" by Michael Wick

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

A Unified Approach for Schema Matching, Coreference and Canonicalization

(2008)

Michael Wick
Khashayar Rohanimanesh
Karl Schultz
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

The automatic consolidation of database records from many heterogeneous sources into a single repository requires solving several information integration tasks. Although tasks such as coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach. Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50\% error reduction for coreference and a 40% error reduction for schema matching.

Keywords

Data Integration,
Coreference,
Schema Matching,
Canonicalization,
Conditional Random Field,
Weighted Logic,
Database Management,
Information Systems

Disciplines

Computer Sciences

Publication Date

2008

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Michael Wick, Khashayar Rohanimanesh, Karl Schultz and Andrew McCallum. "A Unified Approach for Schema Matching, Coreference and Canonicalization" (2008)
Available at: http://works.bepress.com/andrew_mccallum/91/