Skip to main content
Article
Citation enrichment improves deduplication of primary evidence
Lecture Notes in Computer Science
  • Miew Keen Choong, Macquarie University, Sydney, Australia
  • Sarah Thorning, Bond University
  • Guy Tsafnat, Macquarie University, Sydney, Australia
Date of this Version
11-26-2015
Document Type
Journal Article
Publication Details

Citation only

Choong, M.K., Thorning, S., Tsafnat, G. (2015). Citation enrichment improves deduplication of primary evidence. Lecture Notes in Computer Science, 9441, 237-244. doi: 10.1007/978-3-319-25660-3_20.

Access the journal

© Copyright, Springer International Publishing Switzerland, 2015

Abstract

Objective:

To automatically detect duplicate citations in a bibliographical database.

Background:

Citations retrieved from multiple search databases have different forms making manual and automatic detection of duplicates difficult. Existing methods rely on fuzzy-similarity measures which are error-prone.

Methods:

We analysed four pairs of original search results from MEDLINE and EMBASE that were used to create systematic reviews. An automatic tool deduplicated citations by first enriching citations with Digital Object Identifiers (DOI), and/or other unique identifiers. Duplication of records was then determined by comparing these unique identifiers. We compared our method with the duplicate detection function of a popular citation management desktop application in several configurations.

Results:

Citation Enrichment identified 93 % (range 86 %–100 %) of the duplicates indexed online and erroneously marked 3 % (range 0 %–6 %) documents as duplicates. The citation management application found 68 % (range 64 %–72 %) without error using default setting. When set for highest deduplication, the citation management application found 94 % of duplicates (range 77 %–100 %) and 4 % error (range 0 %–8 %).

Conclusion:

Citation enrichment using unique identifiers enhances automatic deduplication. On its own, the approach seems slightly superior to tools that compare citations without enrichment. Methods that combine citation enrichment with existing fuzzy-matching may substantially reduce resource requirements of evidence synthesis.

Citation Information
Miew Keen Choong, Sarah Thorning and Guy Tsafnat. "Citation enrichment improves deduplication of primary evidence" Lecture Notes in Computer Science Vol. 9441 (2015) p. 237 - 244 ISSN: 0302-9743
Available at: http://works.bepress.com/sarah_thorning/8/