On Leveraging Crowdsourcing Techniques for Schema Matching Networks

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Nguyen, Quoc Viet Hung
Nguyen, Thanh Tam
Miklos, Zoltan
Aberer, Karl
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2013
Size
File type(s)
Location
License
Abstract

As the number of publicly-available datasets are likely to grow, the demand of establishing the links between these datasets is also getting higher and higher. For creating such links we need to match their schemas. Moreover, for using these datasets in meaningful ways, one often needs to match not only two, but several schemas. This matching process establishes a (potentially large) set of attribute correspondences between multiple schemas that constitute a schema matching network. Various commercial and academic schema matching tools have been developed to support this task. However, as the matching is inherently uncertain, the heuristic techniques adopted by these tools give rise to results that are not completely correct. Thus, in practice, a post-matching human expert effort is needed to obtain a correct set of attribute correspondences.

Addressing this problem, our paper demonstrates how to leverage crowdsourcing techniques to validate the generated correspondences. We design validation questions with contextual information that can effectively guide the crowd workers. We analyze how to reduce overall human effort needed for this validation task. Through theoretical and empirical results, we show that by harnessing natural constraints defined on top of the schema matching network, one can significantly reduce the necessary human work.

Journal Title

Lecture Notes in Computer Science

Conference Title
Book Title
Edition
Volume

7826

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2013 Springer International Publishing AG. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com.

Item Access Status
Note
Access the data
Related item(s)
Subject

Database systems

Information and computing sciences

Persistent link to this record
Citation
Collections