Processing Queries and Merging Schemas in Support of Data Integration

April 6, 2004
3:00pm- 4:00pm
Halligan 111

Abstract

The goal of data integration is to provide a uniform interface, called a mediated schema, to a set of autonomous data sources, which allows users to query a set of databases without knowing the schemas of the underlying data sources. This talk describes two aspects of data integration: an algorithm for answering queries posed to a mediated schema and the process of creating a mediated schema. First, we present the MiniCon algorithm for answering queries in a data integration system and explain why MiniCon outperforms previous algorithms by up to several orders of magnitude. Second, given two relational schemas for data sources, we propose an approach for using conjunctive queries to describe mappings between them. We analyze their formal semantics, show how to derive a mediated schema based on such mappings, and show how to translate user queries over the mediated schema into queries over local schemas. Finally, we show how to implement the derivation of mediated schemas using a generic Merge operator that manipulates schemas and mappings. We conclude by motivating Merge as one of the operators of the Model Management framework for meta data management.