Success in database schema integration depends on the ability to capture real world semantics of the schema objects, and to reason about the semantics. Earlier schema integration approaches mainly rely on heuristics and human reasoning. In this paper, we discuss an approach to automate a significant part of the schema integration process.
Our approach consists of three phases. An attribute hierarchy is generated in the first phase. This involves identifying relationships (equality, disjointness and inclusion) among attributes. We discuss a strategy based on user-specified semantic clustering. In the second phase, a classification algorithm based on the semantics of class subsumption is applied to the class definitions and the attribute hierarchy to automatically generate a class taxonomy. This class taxonomy represents a partially integrated schema. In the third phase, the user may employ a set of well-defined comparison operators in conjunction with a set of restructuring operators, to further modify the schema. These operators as well as the automatic reasoning during the second phase are based on subsumption.
The formal semantics and automatic reasoning utilized in the second phase is based on a terminological logic as adapted in the CANDIDE data model. Classes are completely defined in terms of attributes and constraints. Our observation is that the inability to completely define attributes and thus completely capture their real world semantics imposes a fundamental limitation on the possibility of automatically reasoning about attribute definitions. This necessitates human reasoning during the first phase of the integration approach.
Available at: http://works.bepress.com/amit_sheth/461/