Performance of Artificial Intelligence in Evidence Synthesis and Systematic Reviews is Affected by Multiplicity and Duplication

Systematic Review Consultants LTD
6 min readSep 19, 2024

--

Document Multiplicity and Data Duplication. Created by: https://deepai.org/machine-learning-model/text2img

Not new and not good news!

Since duplication and multiplicity are connected, I put them in one post. They are so connected that I have seen people exclude multiple but different reports of one Study, calling them duplicates! That's not true!

Duplicate Records in Screening and AI

I have babbled about duplicates and deduplication in length, from the typology of duplicates at record level to data level to why EndNote and other programs cannot find the duplicates. Here, I summarize:

Finding and removing duplicates was already difficult for human reviewers and machines. Now, we know that having duplicates can affect the weighting of the relevancy of the records and bias the machine. So, we must remove almost all the duplicates before asking AI to screen them! I’m well against perfectionism in deduplication, and in my other post, I explained that we don’t need to remove 100% of duplicate records, but in light of the new information, we may be better do our best to remove all the duplicates if planning to use AI! Ironically, AI could be the best help to detect duplicate records, but many tools are still ignoring this important step: Deduplication.

We have yet to see an AI system that detects 100% of duplicate records. If AI cannot do this straightforward step, I, as one, never expect AI to do anything else 100%. We need humans to double-check everything.

The main issue with duplicate records in screening is that if the reviewers see the same record twice (A and B), they include A and exclude B! So, the machine gets confused about whether the record is relevant or irrelevant. Imagine there are two reviewers and they each do differently for these duplicate records, one of them Include A and Exclude B and the other does the opposite. Poor machine! In such cases, the best practice is to mark the spare record as Duplicate rather than Exclude; however, only if the tool you use have such option to mark a record as duplicate during screening.

Let’s move to the next concept: Multiplicity (multiple reports, not multiple comparisons).

Overcounting and Undercounting in Meta-Analysis

You already knew about the problem of overcounting and undercounting in meta-analysis and relevancy checks. If not, let me explain in short:

Imagine you are conducting a Single Study. You register it as a Trial Registry Record. You publish the Protocol as the First Paper. You report the Initial Findings in the Second Paper, present it as the First Conference Abstract, Publish some of the outcomes in the Third Paper, and Publish some other outcomes in the Fourth Paper. After presenting them at conferences twice, you publish a Follow Up Paper. Since there are interesting developments in the field, you can conduct subgroup analysis and post hoc analyses and publish two more papers. So, let's sum up

1 Clinical Trial Registration

7 Published Papers

3 Conference Abstract/Slides/Posters

So, you conducted a 1 Study with 11 Reports (Full Texts). When we find these 11 in the bibliographic databases as the search results, we call them Records because they only have title, abstract, and bibliographic information (See Here to Learn about Study, Report, and Record).

During Systematic Review, if you keep only the "Main Paper" — as some people put it, whatever that is — you risk missing important Records and Reports = Undercounting the data for certain outcomes in meta-analysis and missing information for risk of bias assessment.

In another case, if you don't put all these reports together as 1 Study, you risk including separate Reports more than once as unique multiple studies and Overcounting 1 Study as Multiple Studies!

The Solution in hand is "Studification": Study-Based Analysis or Study-Centered Analysis. In a systematic review, the Study is the unit of currency, not the records or reports. The only way to do it right is to list all the records/reports of one Study under the Study name and use and cite them all: Perry et al. 2025 [3–13]. It took me a PhD to do this for all randomized controlled trials of a single condition :D Crazy, I know, but again, one must be crazy to do a PhD.

So AI, ha? Let yourself think about it before reading my thoughts. Can AI do this?

Multiplicity in Data Extraction and AI

  • Can AI find all reports of the same Study?
  • Can AI take into account all the reports in the Eligibility check to include and exclude a study?
  • Can AI consider Prospective Registration of a Trial in assessing Selective Reporting Bias [Not just reporting a Registration Number]?
  • Can AI collect all outcomes data for multiple time points from multiple reports without overcounting or undercounting?
  • Can AI collect, highlight, and show you multiple parts of one report related to one of the biases and propose a bias level based on these separate sentences? How about multiple sections from multiple reports of one Study?

The simple answer is NO. Sorry, but that's true. Unless you clearly train your dragon — read AI — and instruct it and engineer your input, output, prompts, teaching it all the tricky examples, then yes. But isn't it too much? How long will it take to do all these? Maybe it is faster and cheaper not to use AI!

Some AI-based tools with proper Machine Teachers (prompt engineers with years of experience in evidence synthesis) can possibly be able to take on a Study as a single file (combined Reports) or multiple files (Reports + appendices) and do the job.

So what?

  1. Many AI tool designers are not experts in Evidence Synthesis (ES); it is important that AI and the ES community collaborate to create the best tools. Problems such as multiplicity and duplication can only be raised by the ES community.
  2. Duplicate records may affect the performance of AI negatively; deduplicate before screening. Don't be lazy.
  3. AI still cannot find 100% of duplicate records; it is naive to think AI can do anything else perfectly. Always check.
  4. We cannot expect all users to be Prompt Engineers. Maybe the next generation, but not our generation. AI-based tools should prepare appropriate default prompts and instructions for the users to help them use AI responsibly.
  5. Users should consider using AI-based tools specifically for evidence synthesis rather than general LLMs because such tools have adjusted/tuned/prompted LLMs for specific tasks and can perform in a more targeted way even when dealing with duplicate records.
  6. Remember, the Invisibility Cloak came with instructions: Use it well! It is very important to use the tools well, and to do that, you need to know the tools, what's behind them, how AI works, how to write prompts, how to train AI to serve you, and how to double-check the output.

AI tools serve the experts better than beginners and amateurs because

A. Expert Machine Teachers can train the machines best and create the best machines; simply because they are experts. If input is expert, the output is expert.

B. The only way to know AI’s output is appropriate is that the human how assesses the output is an expert and can detect the high quality output from the low or medium quality one. Again, because they are experts.

So, to use AI well, you better first be an expert in your field or have an expert on your team. What do you think?

I will have two more posts coming related to this one. Stay tuned.

If you liked this blog post, please support me by pressing the green Follow button and signing up so I can write more. Email Subscription Dysfunctions. Thank you :D

--

--

Systematic Review Consultants LTD

Evidence Synthesis, Systematic Review, and GRADE Services for Clinical Practice Guidelines, HEOR, and HTA https://systematicreview.info/