Currently, there is no way to identify 100% of duplicate records in EndNote during a systematic review. Do we need to identify and remove 100% of the duplicates? If yes, does it worth it? If No, why bother reading this post :D
Anyway, if you have ever been questioned by fellow review team members about missing a few duplicates among the search results, this post is for you.
Missing a few duplicates is not a big deal and will not invalidate your systematic review.
But why is it that EndNote cannot find all the duplicates no matter how you tweak the duplication filter and combine the fields?
1. To Dash or not to Dash or to double-Dash!
In typography, there is more than one type of dash or hyphen. These hyphens can get as long as it gets. We have en dash (n or –) and em dash (m or — ). Changing encoding (ANSI vs Unicode or UTF-8) can make such characters unreadable, and the system may replace them with black rhombus with a white question mark.
The databases had no choice but to avoid surprising conversions popular with ANSI encoding. So, they started converting the en and em dashes into simple dashes. MEDLINE decided to replace them with double dashes, and Embase decided to go for “space single dash space”. You can imagine the rest! EndNote cannot ignore the dashes during the Find Duplicates function and thinks the two records are different.
2. Other Encoding During Import
It is important to save your txt files in UTF-8 or Unicode format if you want to import them into EndNote. This is specifically important if you have records with non-English characters, either author names or something in the title. If you use ANSI, don’t be surprised to see all accented characters have been converted into unreadable characters.
3. Symbol to Character
® or registered trademark symbol is another culprit. Some databases convert it to R; some remove it, and others keep it as it is.