DedupEndNote (version 1.1.2 2025-12-30)

1. INPUT FILE


2. START

3. RESULT

Progress

Waiting for new input file ...

Introduction

DedupEndNote is a tool for removing duplicate records from EndNote or Zotero databases exported in RIS format. It is more flexible than the built in deduplication features in EndNote or Zotero, and identifies a lot more duplicates that those programs.

You can:

  • Deduplicate a single file (this page)
  • Compare a new file against an existing file (see Deduplicate 2 files)

The program has been tested on EndNote and Zotero databases with records from: CINAHL (EBSCOHost), ClinicalTrials.gov, Cochrane Library, EMBASE (OVID and Embase.com), Medline (OVID), PsycINFO (OVID), PubMed, Scopus, Web of Science (very few tests with conference papers).

Steps

Deduplicate one file:

  • Export an EndNote / Zotero database into a file in RIS format
  • Upload this file in DedupEndNote
  • Save the results file with deduplicated records
  • Import this results file into a new EndNote / Zotero database

Deduplicate a new file against an existing file / EndNote database: see Deduplicate 2 files

How it works

DedupEndNote compares each pair of records in up to five stages, stopping early if a mismatch is found:

  1. Publication Year: Matches if the years are the same or differ by at most one year.
  2. Starting Page or DOI: Compares page numbers and DOIs with preprocessing to handle variations.
  3. Authors: Uses Jaro-Winkler similarity on up to the first 40 authors, with name normalization.
  4. Title: Compares normalized or reversed titles.
  5. ISBN / ISSN / Journal: Matches exact ISBNs or ISSNs or compares normalized journal titles.

If a pair scores YES in all applicable comparisons, they are considered duplicates.

Output

By default, only the first record in each duplicate set is kept. The output file:

  • Preserves all unique records
  • Enriches them with missing data from duplicates (e.g., DOI, journal name, publication year, pages)
  • Normalizes certain fields (e.g., DOI format, page ranges)

Mark mode

Instead of removing duplicates, Mark Mode labels them for manual review:

  • The ID of the first record in each duplicate set is copied to the Label (LB) field of all duplicates.
  • The original Label content is overwritten.
  • No enrichment or normalization is performed.

This mode is useful if you want to merge records manually in EndNote.

How to cite

If you use DedupEndNote, please cite:

Lobbestael, G. (2025). DedupEndNote (Version 1.1.1) [Computer software]. https://github.com/globbestael/DedupEndNote

Issues and feature requests

If you have any questions about the tool or come across a problem when trying to use it, please raise an issue on the GitHub Repository.