Result selection and summarization for Web Table search
File version
Accepted Manuscript (AM)
Author(s)
Nguyen, Quae Viet Hung
Weidlich, Matthias
Aberer, Karl
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Jeong-Hyon Hwang, Yang-Sae Moon
Date
Size
File type(s)
Location
Seoul, SOUTH KOREA
License
Abstract
The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the diversified table selection problem and the structured table summarization problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50% in diversity and 10% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50%. In a user study, we observed that our techniques are preferred over alternative solutions.
Journal Title
Conference Title
2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Item Access Status
Note
Access the data
Related item(s)
Subject
Database systems