Introduction
DedupEndNote is a tool for removing duplicate records from EndNote or Zotero databases
exported in RIS format.
It is more flexible than the built in deduplication features in EndNote or Zotero,
and identifies a lot more duplicates that those programs.
You can:
- Deduplicate a single file (this page)
- Compare a new file against an existing file (see Deduplicate 2 files)
The program has been tested on EndNote and Zotero databases with records from:
CINAHL (EBSCOHost), ClinicalTrials.gov, Cochrane Library, EMBASE (OVID and Embase.com), Medline
(OVID), PsycINFO (OVID), PubMed, Scopus, Web of Science (very few tests with conference papers).
Steps
Deduplicate one file:
- Export an EndNote / Zotero database into a file in RIS format
- Upload this file in DedupEndNote
- Save the results file with deduplicated records
- Import this results file into a new EndNote / Zotero database
Deduplicate a new file against an existing file / EndNote database: see Deduplicate 2
files
How it works
DedupEndNote compares each pair of records in up to five stages, stopping early if a mismatch is
found:
- Publication Year: Matches if the years are the same or differ by at most one year.
- Starting Page or DOI: Compares page numbers and DOIs with preprocessing to handle
variations.
- Authors: Uses Jaro-Winkler similarity on up to the first 40 authors, with name
normalization.
- Title: Compares normalized or reversed titles.
- ISBN / ISSN / Journal: Matches exact ISBNs or ISSNs or compares normalized journal
titles.
If a pair scores YES in all applicable comparisons, they are considered duplicates.
Output
By default, only the first record in each duplicate set is kept. The output file:
- Preserves all unique records
- Enriches them with missing data from duplicates (e.g., DOI, journal name, publication year,
pages)
- Normalizes certain fields (e.g., DOI format, page ranges)
Mark mode
Instead of removing duplicates, Mark Mode labels them for manual review:
- The ID of the first record in each duplicate set is copied to the Label (LB) field of
all duplicates.
- The original Label content is overwritten.
- No enrichment or normalization is performed.
This mode is useful if you want to merge records manually in EndNote.
How to cite
If you use DedupEndNote, please cite:
Lobbestael, G. (2025). DedupEndNote (Version 1.1.1) [Computer software].
https://github.com/globbestael/DedupEndNote
Issues and feature requests
If you have any questions about the tool or come across a problem when trying to use it,
please raise an issue on the
GitHub Repository.