Orchestra: Facilitating Collaborative Data Sharing
Postprint version. Copyright ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in Proceedings of the 2007 ACM SIGMOD International Conference on the Management of Data (SIGMOD/PODS 2007), June 2007, 4 pages.
Publisher URL: http://sigmod07.riit.tsinghua.edu.cn/acceptedPaperForSIGMOD.shtml
One of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange  are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery.
Todd J. Green, Grigoris Karvounarakis, Nicholas E. Taylor, Olivier Biton, Zachary G. Ives, and Val Tannen. "Orchestra: Facilitating Collaborative Data Sharing" Database Research Group (CIS) (2007).
Available at: http://works.bepress.com/zives/13