Result selection and summarization for Web Table search
MetadataShow full item record
The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the diversified table selection problem and the structured table summarization problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50% in diversity and 10% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50%. In a user study, we observed that our techniques are preferred over alternative solutions.
Proceedings of the 2015 IEEE 31st International Conference on Data Engineering
Copyright 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.