A Bag Reconstruction Method for Multiple Instance Classification and Group Record Linkage
MetadataShow full item record
Record linking is the task of detecting records in several databases that refer to the same entity. This task aims at exploring the relationship between entities, which normally lack common identifiers in heterogeneous datasets. When entities contain multiple relational records, linking them across datasets can be more accurate by treating the records as groups, which leads to group linking methods. Even so, individual record links may still be needed for the final group linking step. This problem can be solved by multiple instance learning, in which group links are modelled as bags, and record links are considered as instances. In this paper, we propose a novel method for instance classification and group record linkage via bag reconstruction from instances. The bag reconstruction is based on the modeling of the distribution of negative instances in the training bags via kernel density estimation. We evaluate this approach on both synthetic and real-world data. Our results show that the proposed method can outperform several baseline methods.
Advanced Data Mining and Applications: 8th International Conference, ADMA 2012
© 2012 Springer Berlin/Heidelberg. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com
Pattern Recognition and Data Mining