<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
<channel>
<title>Zachary G Ives</title>
<copyright>Copyright (c) 2009  All rights reserved.</copyright>
<link>http://works.bepress.com/zives</link>
<description>Recent documents in Zachary G Ives</description>
<language>en-us</language>
<lastBuildDate>Sun, 31 May 2009 13:12:55 PDT</lastBuildDate>
<ttl>3600</ttl>





<item>
<title>Integrating Network-Bound XML Data</title>
<link>http://works.bepress.com/zives/22</link>
<guid isPermaLink="true">http://works.bepress.com/zives/22</guid>
<pubDate>Wed, 16 Apr 2008 07:31:04 PDT</pubDate>
<description>Although XML was originally envisioned as a replacement for HTML on the web, to this point it has instead
been used primarily as a format for on-demand interchange of data between applications and enterprises. The web
is rather sparsely populated with static XML documents, but nearly every data management application today can
export XML data. There is great interest in integrating such exported data across applications and administrative
boundaries, and as a result, efficient techniques for integrating XML data across local- and wide-area networks are an
important research focus.
In this paper, we provide an overview of the Tukwila data integration system, which is based on the first XML
query engine designed specifically for processing network-bound XML data sources. In contrast to previous approaches,
which must read, parse, and often store XML data before querying it, the Tukwila XML engine can return
query results even as the data is streaming into the system. Tukwila features a new system architecture that extends
relational query processing techniques, such as pipelining and adaptive query processing, into the XML realm. We
compare the focus of the Tukwila project to that of other XML research systems, and then we present our system
architecture and novel query operators, such as the x-scan operator. We conclude with a description of our current
research directions in extending XML-based adaptive query processing.</description>

<author>Zachary G. Ives</author>


<category>Adaptive query processing</category>

<category>Data integration and exchange</category>

<category>XML processing</category>

</item>


<item>
<title>Adaptive Query Processing for Internet Applications</title>
<link>http://works.bepress.com/zives/21</link>
<guid isPermaLink="true">http://works.bepress.com/zives/21</guid>
<pubDate>Wed, 16 Apr 2008 07:31:01 PDT</pubDate>
<description>As the area of data management for the Internet has gained in popularity, recent work has focused on effectively
dealing with unpredictable, dynamic data volumes and transfer rates using adaptive query processing techniques.
Important requirements of the Internet domain include: (1) the ability to process XML data as it streams in from the
network, in addition to working on locally stored data; (2) dynamic scheduling of operators to adjust to I/O delays
and flow rates; (3) sharing and re-use of data across multiple queries, where possible; (4) the ability to output results
and later update them. An equally important consideration is the high degree of variability in performance needs for
different query processing domains: perhaps an ad-hoc query application should optimize for display of incomplete
and partial incremental results, whereas a corporate data integration application may need the best time-to-completion
and may have very strict data &#34;freshness&#34; guarantees. The goal of the Tukwila project at the University of Washington
is to design a query processing system that supports a range of adaptive techniques that are configurable for different
query processing contexts.</description>

<author>Zachary G. Ives</author>


<category>Adaptive query processing</category>

<category>Data integration and exchange</category>

</item>


<item>
<title>Interviewing During a Tight Job Market</title>
<link>http://works.bepress.com/zives/20</link>
<guid isPermaLink="true">http://works.bepress.com/zives/20</guid>
<pubDate>Wed, 16 Apr 2008 07:30:58 PDT</pubDate>
<description>Various tips for interviewing for PhD graduates, seeking an academic position in a research university in Asia or North America are discussed. It is suggested that having the dissertation done before interviews gives a large degree of relief on one's mind. It is found that to be practical about job research package and keep a close eye on applications increases the confidence level. It is also observed that the questions during the talk provides opportunity to clarify and strengthen the talk and show this ability during the interview.</description>

<author>Zachary G. Ives</author>


</item>


<item>
<title>An Adaptive Query Execution System for Data Integration</title>
<link>http://works.bepress.com/zives/19</link>
<guid isPermaLink="true">http://works.bepress.com/zives/19</guid>
<pubDate>Wed, 16 Apr 2008 07:30:54 PDT</pubDate>
<description>Query processing in data integration occurs over network bound, autonomous data sources. This requires extensions to traditional optimization and execution techniques for three reasons: there is an absence of quality statistics about the data, data transfer rates are unpredictable and bursty, and slow or unavailable data sources can often be replaced by overlapping or mirrored sources. This paper presents the Tukwila data integration system, designed to support adaptivity at its core using a two-pronged approach. Interleaved planning and execution with partial optimization allows Tukwila to quickly recover from decisions based on inaccurate estimates. During execution, Tukwila uses adaptive query operators such as the double pipelined hash join, which produces answers quickly, and the dynamic collector, which robustly and efficiently computes unions across overlapping data sources. We demonstrate that the Tukwila architecture extends previous innovations in adaptive execution (such as query scrambling, mid-execution re-optimization, and choose nodes), and we present experimental evidence that our techniques result in behavior desirable for a data integration system.</description>

<author>Zachary G. Ives</author>


<category>Adaptive query processing</category>

<category>Data integration and exchange</category>

</item>


<item>
<title>Schema Mediation in Peer Data Management Systems</title>
<link>http://works.bepress.com/zives/18</link>
<guid isPermaLink="true">http://works.bepress.com/zives/18</guid>
<pubDate>Wed, 16 Apr 2008 07:30:52 PDT</pubDate>
<description>Intuitively, data management and data integration tools should
be well-suited for exchanging information in a semantically meaningful
way. Unfortunately, they suffer from two significant problems:
they typically require a comprehensive schema design before
they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may
break backwards compatibility. As a result, many small-scale data
sharing tasks are more easily facilitated by non-database-oriented
tools that have little support for semantics.
The goal of the peer data management system (PDMS) is to
address this need: we propose the use of a decentralized, easily
extensible data management architecture in which any user
can contribute new data, schema information, or even mappings
between other peers' schemas. PDMSs represent a natural step
beyond data integration systems, replacing their single logical
schema with an interlinked collection of semantic mappings between
peers' individual schemas.
This paper considers the problem of schema mediation in a
PDMS. Our first contribution is a flexible language for mediating
between peer schemas, which extends known data integration
formalisms to our more complex architecture. We precisely
characterize the complexity of query answering for our language.
Next, we describe a reformulation algorithm for our language that
generalizes both global-as-view and local-as-view query answering
algorithms. Finally, we describe several methods for optimizing
the reformulation algorithm, and an initial set of experiments
studying its performance.</description>

<author>Alon Halevy</author>


<category>Data integration and exchange</category>

<category>Peer-to-peer architectures</category>

</item>


<item>
<title>Reconciling while Tolerating Disagreement in Collaborative Data Sharing</title>
<link>http://works.bepress.com/zives/17</link>
<guid isPermaLink="true">http://works.bepress.com/zives/17</guid>
<pubDate>Wed, 16 Apr 2008 07:30:48 PDT</pubDate>
<description>In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty, uncertain, or even controversial. Collaborators are willing to share their data, and in many cases they also want to selectively import data from others - but must occasionally diverge when they disagree about uncertain or controversial facts or values. For this reason, traditional data sharing and data integration approaches are not applicable, since they require a globally \emph{consistent} data instance. Additionally, many of these approaches do not allow participants to make updates; if they do, concurrency control algorithms or inconsistency repair techniques must be used to ensure a consistent view of the data for all users.
In this paper, we develop and present a fully decentralized model of collaborative data sharing, in which participants publish their data on an ad hoc basis and simultaneously reconcile updates with those published by others.  Individual updates are associated with provenance information, and each participant accepts only updates with a sufficient authority ranking, meaning that each participant may have a different (though conceptually overlapping) data instance.  We define a consistency semantics for database instances under this model of disagreement, present algorithms that perform reconciliation for distributed clusters of participants, and demonstrate their ability to handle typical update and conflict loads in
settings involving the sharing of curated data.</description>

<author>Nicholas E. Taylor</author>


<category>Data integration and exchange</category>

<category>Peer-to-peer architectures</category>

</item>


<item>
<title>Adapting to Source Properties in Processing Data Integration Queries</title>
<link>http://works.bepress.com/zives/16</link>
<guid isPermaLink="true">http://works.bepress.com/zives/16</guid>
<pubDate>Wed, 16 Apr 2008 07:30:45 PDT</pubDate>
<description>An effective query optimizer finds a query plan that exploits the
characteristics of the source data. In data integration, little is known
in advance about sources' properties, which necessitates the use of
adaptive query processing techniques to adjust query processing
on-the-fly. Prior work in adaptive query processing has focused on
compensating for delays and adjusting for mis-estimated cardinality
or selectivity values. In this paper, we present a generalized architecture
for adaptive query processing and introduce a new technique,
called adaptive data partitioning (ADP), which is based on
the idea of dividing the source data into regions, each executed by
different, complementary plans. We show how this model can be
applied in novel ways to not only correct for underestimated selectivity
and cardinality values, but also to discover and exploit order
in the source data, and to detect and exploit source data that can be
effectively pre-aggregated. We experimentally compare a number
of alternative strategies and show that our approach is effective.</description>

<author>Zachary G. Ives</author>


<category>Adaptive query processing</category>

<category>Data integration and exchange</category>

</item>


<item>
<title>The Piazza peer data management system</title>
<link>http://works.bepress.com/zives/15</link>
<guid isPermaLink="true">http://works.bepress.com/zives/15</guid>
<pubDate>Wed, 16 Apr 2008 07:30:42 PDT</pubDate>
<description>Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically
meaningful way. Unfortunately, they suffer from two significant problems: They typically require a comprehensive schema design
before they can be used to store or share information and they are difficult to extend because schema evolution is heavyweight and
may break backward compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by non-databaseoriented
tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: We
propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data,
schema information, or even mappings between other peers' schemas. PDMSs represent a natural step beyond data integration
systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers' individual schemas.
This paper describes several aspects of the Piazza PDMS, including the schema mediation formalism, query answering and
optimization algorithms, and the relevance of PDMSs to the Semantic Web.</description>

<author>Alon Y. Halevy</author>


<category>Data integration and exchange</category>

<category>Peer-to-peer architectures</category>

</item>


<item>
<title>Update Exchange with Mappings and Provenance</title>
<link>http://works.bepress.com/zives/14</link>
<guid isPermaLink="true">http://works.bepress.com/zives/14</guid>
<pubDate>Wed, 16 Apr 2008 07:30:39 PDT</pubDate>
<description>We consider systems for data sharing among heterogeneous peers related by a network of schema
mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries
over related data from other peers as well. To achieve this, every peer's updates propagate along the
mappings to the other peers. However, this update exchange is filtered by trust conditions --
expressing what data and sources a peer judges to be authoritative -- which may cause a peer to reject
another's updates. In order to support such filtering, updates carry provenance information. These
systems target scientific data sharing applications, and their general principles and architecture have
been described in [21].

In this paper we present methods for realizing such systems. Specifically, we extend techniques
from data integration, data exchange, and incremental view maintenance to propagate updates along
mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates
based on trust conditions over this provenance; we discuss strategies for implementing our techniques in
conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the
Orchestra prototype system.

This technical report supersedes the version which appeared in VLDB 2007 [17] and
corrects certain technical claims regarding the semantics of our system (see errata in Sections
[3.1] and [4.1.1]).</description>

<author>Todd J. Green</author>


<category>Data integration and exchange</category>

<category>Peer-to-peer architectures</category>

</item>


<item>
<title>Orchestra: Facilitating Collaborative Data Sharing</title>
<link>http://works.bepress.com/zives/13</link>
<guid isPermaLink="true">http://works.bepress.com/zives/13</guid>
<pubDate>Wed, 16 Apr 2008 07:30:36 PDT</pubDate>
<description>One of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery.</description>

<author>Todd J. Green</author>


<category>Data integration and exchange</category>

<category>Peer-to-peer architectures</category>

</item>



</channel>
</rss>

