Why Does the Reproducibility of Search Strategies and Search Results in Systematic Reviews Reduce Over Time? What We Can Do About It?

Farhad Shokraneh
8 min readDec 6, 2023

--

Reproducibility and Replicability
Reproducibility and Replicability. Source: Turing Way: https://the-turing-way.netlify.app/

There is no excuse; we MUST do our best to report the search methods as reproducible as possible. But what is reproducibility?

In my previous paper titled “Reproducibility and Replicability of Systematic Reviews”, I used the following definition that seems to be still relevant:

Reproducibility is re-conducting the same study, using the same methods and data by a different researcher or team, and replicability is re-doing the same study to gather new data or recollect the data (Patil et al. 2016).

Are Systematic Reviews, Research Studies or Only Literature Reviews?

Some methodologists argue that a systematic review is not a research study but a review, just like any narrative literature review, and we should distinguish review from research. Unlike narrative literature reviews, systematic reviews follow a detailed research question and pre-set protocol involving at least three people. However, one might claim that even narrative reviews can have those properties. What differentiates systematic review from narrative review is Reproducibility.

If a systematic review is reported in a reproducible format so that the reader can follow the methods, repeat the study, and reproduce the same or very similar results, we can call it a research study. Otherwise, we should only call it a narrative review. So reproducibility is even more important for systematic reviews than other research studies. Obviously, all stages of systematic review should be reproducible. I have discussed how using two people or automation at most stages makes the reviews reproducible, but I focus on searching in this post.

Reproducibility in a systematic review search context means the ability to re-run the search strategies in the corresponding sources and reproduce the same or very similar search results in terms of content and number of results.

How do you test if the search strategies reported in a systematic review are reproducible?

Quantitative Reproducibility

Re-run the searches and limit to the last search date. If the number of results for that database is the same or very similar to what was reported in the review, then we can say the search strategy has Quantitative Reproducibility.

Content Reproducibility

Re-run the searches and limit to the last search date. If the search results contain all or almost all the records from the previous search, then the search strategies are reproducible. One way to check this is to run de-duplication, and there should be almost 100% duplication between the previous search and the re-run search. This approach may be time-consuming so as a minimum, the search results from re-running the searches should have all or almost all the ‘included studies’ from the last version of the review.

I don’t start discussing the Validity and Reliability of Search Strategies as I have discussed it in a workshop at BCS and will write about it separately if I can find any free time. Seriously, someone should pay me to do nothing but write.

Why Does the Reproducibility of Search Strategies and Search Results in Systematic Reviews Reduce Over Time?

In the short term, anyone with access to the searched sources and strategies should be able to repeat the searches, and if they limit the search to the original search date, they get the same or very similar results in terms of content and the number of results. But what about the longer term?

In the long term, however, things change, literally!

A. Change in database’s content

Despite what you may think, databases delete contents as well as add them. If you think they are responsible for informing you, think again. At the end of the day, these are business people, not academics or researchers. Not that the researchers are any better or worse!

B. Change in database’s search fields

Databases can add, change, or remove their search fields (metadata) or their content (value). If you are lucky, you will find out in some way or another. Either they will inform you, or you will notice when re-running the searches.

C. Change in database’s search syntax

The databases may change the way that we search them. While the fields may be the same, the syntax we use to search these fields might change.

D. Change in database’s controlled vocabulary

Obvious one. MeSH, Emtree, CINAHL Heading, APA Thesaurus, etc., get updated to not only cover and control the new topics but also to change the coverage or hierarchy of the existing topics. Each year the search strategy gets older, and changes in controlled vocabularies reduce the chance of reproducibility for the existing search strategies. We may need to change the existing strategies to keep them valid.

It is worth noting that the databases may change their indexing policy or depth. For example, instead of indexing based on title, abstract, and author-assigned keywords, they may proceed to full-text indexing. Depending on how the database uses automation/semi-automation, the search results will change.

E. Change in database’s retrieval algorithm

While many bibliographic databases try to stay loyal to the old-fashioned Boolean logic, some sources are not. Actually, they don’t tell us how they are getting records to us or censor the records, showing us only some of them. You want examples? Google and Google Scholar.

One example is stopping the phrase searching when there is punctuation or ignoring punctuation. For example, we searched for ‘Muscle Fatigue’ in a bibliographic database in 2012, and we only got papers mentioning ‘Muscle Fatigue’ but in 2022 we also got more papers. So I ran some testing as I explained before to see new unique papers and we found papers mentioning ‘Muscle, Fatigue’ which were irrelevant to our review.

F. Change in database’s query parser

The search interfaces sometimes use Query Expansion (e.g. Automatic Query Expansion or Automatic Term Mapping (ATM) that can change. I have witnessed such changes both in MEDLINE and Embase interfaces. So you search something, but the search interface interprets (parses) it more than you wanted. It is supposed to be helpful but not necessarily always. If you are a good searcher (or Seeker if you are a Harry Potter fan), you’d know how to tell this feature to shut up! Otherwise, this feature will have its way with your query, and that way can change over time.

G. Change in database’s existence

If they don’t get adopted, databases die or disappear. Yeah, try repeating that search, baby.

H. Missing search strategies

When no search strategy is reported, you can’t do much, can you?

I. Missing search results

Even if you have the search strategies but not the search results, you may be able to repeat the searches; however, if the search strategies don’t report the numbers per database, you won’t be able to check Quantitative Reproducibility. You can’t check Content Reproducibility if the search results are not there.

One of the reasons that we avoid sharing the search results is that many results contain copyrighted abstracts or other database-specific data such as (controlled vocabulary or unique ID). What many don’t know is that the abstract could easily be excluded from being shared either by manipulating the tags when importing to EndNote or creating an export filter in EndNote to exclude abstracts. We have no excuse not to share the search results; we are just busy and lazy. Truth!

J. Reporting the search strategies in an irreproducible format

Tony and I discussed this in detail in our piece at Weave: Journal of Library User Experience. In short, word-processing tools such as Google Docs or Microsoft Word are not suitable formats for reporting the search strategies. PDF is even worse.

  • Auto-correction can undermine truncation and corrupt truncated formats.
  • Spell-checking can obfuscate important differences between British and American English and create unwanted duplicates.
  • Auto-conversion of straight quotations into curly quotations (“”) can cause errors in some platforms, such as Ovid SP.
  • Copying and pasting text fragments between different word-processing tools can lead to losing non-print characters (such as space).

PDF — the enemy of open data — is even worse for breaking lines of a paragraph into multiple paragraphs, and when copied and pasted, it cannot read some of the characters depending on the font used. Still seeking an alternative. “Runnable” Notepad (txt) files, XML, or HTML could potentially be alternatives if supported by the databases as a standard format for search strategies. Currently, many bibliographic databases accept RIS = RefMan = Reference Manager format as standard export format and citation managers (EndNote, Mendeley, Zotero, etc.) and screening programs (Rayyan, Covidence, PICO Portal, etc.) recognise this format as standard import format. We don’t have such a standard format for search strategies.

K. Not reporting the search results in a re-usable format

As mentioned above, many bibliographic databases and citation and screening managers accept RIS as the standard format. However, are these search results available for every systematic review? No. Actually, I have yet to see the first systematic review that reports the search results. If you haven’t noticed, we do not share the search results for systematic reviews, and it is not mandated by any guidelines. Many think that because the search strategies are ‘indefinitely’ reproducible, we don’t need to share the search results; however, because of all the things I said above, it is not true. Even if I’m wrong, the only way to know is to access the previous search results and compare/de-duplicate them against the new results. But what can we do when the results are not there or shared in a format that cannot be used?

Conclusion

Reproducible Systematic Reviews are Research Studies. Irreproducible ones are as good as narrative reviews.

A reproducible search should have both Quantitative Reproducibility and Content Reproducibility. An alternative is that if you find all the included studies in the last version of the systematic review by running the searches, you can claim the search was reproducible (Content Reproducibility).

Now you know 11 reasons why the reproducibility of search strategies reduces over time, and our bad practice does not help. So, if you can, try a better practice. Don’t worry; we are not in trouble. As long as we know what we are doing or not doing with our and other people’s searches, we will be fine regarding reproducibility.

Keep an open mind about the limitations of reproducing or replicating the search results. I discussed some of these in my previous post, but you had a lot more here. Now, you know what to consider when using someone else’s searches, or you want to pick on someone else’s searches — oh, sorry, I meant peer-reviewing electronic search strategies (PRESS).

Current standards do not recommend a good format in which we should share the search strategies and do not mandate reporting of search results. So, they do not favour reproducibility, but we can gradually change this by our best practice. Or you can be a messenger who preaches water and drinks wine.

If I have extra reason to add to the above 11 reasons, let me know.

If you liked this blog post, please support me by pressing the green Follow button and signing up so I can write more. Email Subscription Dysfunctions. Thank you :D

--

--

Farhad Shokraneh
Farhad Shokraneh

Written by Farhad Shokraneh

Evidence Synthesis Manager, Oxford Uni Post-Doc Research Associate, Cambridge Uni Senior Research Associate, Bristol Uni Director, Systematic Review Consultants