DedupEndNote (version 1.1.2 2025-12-30)

1. INPUT FILE

2. START

3. RESULT

Progress

Waiting for new input file ...

Restart

Introduction

DedupEndNote is a tool for removing duplicate records from EndNote or Zotero databases exported in RIS format. It is more flexible than the built in deduplication features in EndNote or Zotero, and identifies a lot more duplicates that those programs.

You can:

Deduplicate a single file (this page)
Compare a new file against an existing file (see Deduplicate 2 files)

The program has been tested on EndNote and Zotero databases with records from: CINAHL (EBSCOHost), ClinicalTrials.gov, Cochrane Library, EMBASE (OVID and Embase.com), Medline (OVID), PsycINFO (OVID), PubMed, Scopus, Web of Science (very few tests with conference papers).

Steps

Deduplicate one file:

Export an EndNote / Zotero database into a file in RIS format
Upload this file in DedupEndNote
Save the results file with deduplicated records
Import this results file into a new EndNote / Zotero database

Deduplicate a new file against an existing file / EndNote database: see Deduplicate 2 files

The page Details, lots of details ... has a lot more detailed information on the following subjects.

How it works

DedupEndNote compares each pair of records in up to five stages, stopping early if a mismatch is found:

Publication Year: Matches if the years are the same or differ by at most one year.
Starting Page or DOI: Compares page numbers and DOIs with preprocessing to handle variations.
Authors: Uses Jaro-Winkler similarity on up to the first 40 authors, with name normalization.
Title: Compares normalized or reversed titles.
ISBN / ISSN / Journal: Matches exact ISBNs or ISSNs or compares normalized journal titles.

If a pair scores YES in all applicable comparisons, they are considered duplicates.

Output

By default, only the first record in each duplicate set is kept. The output file:

Preserves all unique records
Enriches them with missing data from duplicates (e.g., DOI, journal name, publication year, pages)
Normalizes certain fields (e.g., DOI format, page ranges)

Mark mode

Instead of removing duplicates, Mark Mode labels them for manual review:

The ID of the first record in each duplicate set is copied to the Label (LB) field of all duplicates.
The original Label content is overwritten.
No enrichment or normalization is performed.

This mode is useful if you want to merge records manually in EndNote.

How to cite

If you use DedupEndNote, please cite:

Lobbestael, G. (2025). DedupEndNote (Version 1.1.1) [Computer software]. https://github.com/globbestael/DedupEndNote

Issues and feature requests

If you have any questions about the tool or come across a problem when trying to use it, please raise an issue on the GitHub Repository.