A (research) consumer's view of data integration research
Data integration technology aims to transform data from the providers' form to a form consumers can use, and also to merge data from multiple providers. Relevant theory and algorithms appeared as early as the 1980s, but until very recently the transition to products has been highly disappointing. The disconnect has reduced the power and elegance of systems, and at the same time has denied researchers a source of inspiration, funding, and reality checks.
After an introduction to the challenges of data integration, we examine research areas whose results were difficult to transfer. From these, we identify generic roadblocks that can discourage a product planner or a developer from exploiting available theory. For example, concerns of product planners suggest a general principle: Attack "downstream" problem stages first. Developer concerns suggest expansions of the research agenda. For example, rather than a perfect solution to a simplified problem (representing a tiny market), developers are asked to implement decent solutions to messy problems. After a researcher has obtained powerful results on a simplified problem, they can help greatly by also devising techniques that let their algorithm contribute to the messy whole. Research challenges include problem partitioning techniques that leave a smaller residue problem, or finding a tractable problem "close" to what the user asked.
Bio Arnie Rosenthal works in many areas of data management research, and on connections to enterprise systems and artificial intelligence work. His work has included research and consulting on data administration and integration, database security, distributed objects, migration of legacy systems, data mining, query processing, and discrete algorithms. After doing his PhD at Berkeley, he was on the faculty at Michigan, and did research at Sperry Corporate Research, the Computer Corporation of America, and at MITRE. He has been a visiting researcher at IBM Almaden Research and ETH Zurich.