TwigBuffer: Avoiding Useless Intermediate Solutions Completely in Twig Joins
MetadataShow full item record
Twig pattern matching plays a crucial role in XML data processing. TwigStack  is a holistic twig join algorithm that solves the problem in two steps: (1) finding potentially useful intermediate path solutions, (2) merging the intermediate solutions. The algorithm is optimal when the twig pattern has only //-edges, in the sense that no useless partial solutions are generated in the first step (thus expediting the second step and boosting the overall performance). However, when /-edges are present, a large set of useless partial solutions may be produced, which directly downgrades the overall performance. Recently, some improved versions of the algorithm (e.g., TwigStackList and iTwigJoin) have been proposed in an attempt to reduce the number of useless partial solutions when /-edges are involved. However, none of the algorithms can avoid useless partial solutions completely. In this paper, we propose a new algorithm, TwigBuffer, that is guaranteed to completely avoid the useless partial solutions. Our algorithm is based on a novel strategy to buffer and manipulate elements in stacks and lists. Experiments show that TwigBuffer significantly outperforms previous algorithms when arbitrary /-edges are present.
Lecture Notes in Computer science
Copyright 2008 Springer. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com