Automation of duplicate record detection for systematic reviews: Deduplicator
File version
Version of Record (VoR)
Author(s)
Greenwood, Hannah
Carter, Matt
Clark, Justin
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Abstract
Background To describe the algorithm and investigate the efficacy of a novel systematic review automation tool “the Deduplicator” to remove duplicate records from a multi-database systematic review search.
Methods We constructed and tested the efficacy of the Deduplicator tool by using 10 previous Cochrane systematic review search results to compare the Deduplicator’s ‘balanced’ algorithm to a semi-manual EndNote method. Two researchers each performed deduplication on the 10 libraries of search results. For five of those libraries, one researcher used the Deduplicator, while the other performed semi-manual deduplication with EndNote. They then switched methods for the remaining five libraries. In addition to this analysis, comparison between the three different Deduplicator algorithms (‘balanced’, ‘focused’ and ‘relaxed’) was performed on two datasets of previously deduplicated search results.
Results Before deduplication, the mean library size for the 10 systematic reviews was 1962 records. When using the Deduplicator, the mean time to deduplicate was 5 min per 1000 records compared to 15 min with EndNote. The mean error rate with Deduplicator was 1.8 errors per 1000 records in comparison to 3.1 with EndNote. Evaluation of the different Deduplicator algorithms found that the ‘balanced’ algorithm had the highest mean F1 score of 0.9647. The ‘focused’ algorithm had the highest mean accuracy of 0.9798 and the highest recall of 0.9757. The ‘relaxed’ algorithm had the highest mean precision of 0.9896.
Conclusions This demonstrates that using the Deduplicator for duplicate record detection reduces the time taken to deduplicate, while maintaining or improving accuracy compared to using a semi-manual EndNote method. However, further research should be performed comparing more deduplication methods to establish relative performance of the Deduplicator against other deduplication methods.
Journal Title
Systematic Reviews
Conference Title
Book Title
Edition
Volume
13
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© The Author(s) 2024. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation
Forbes, C; Greenwood, H; Carter, M; Clark, J, Automation of duplicate record detection for systematic reviews: Deduplicator, Systematic Reviews, 2024, 13, pp. 206