Fitness
Generative Artificial Intelligence Accurate in Screening Patients with Heart Failure for Clinical Trial Eligibility
Large language models (LLMs) featuring generative artificial intelligence (AI) can rapidly and accurately screen patients with heart failure (HF) for eligibility in clinical trials, which could make it cheaper and faster to evaluate new treatments and bring successful ones to patients, according to a study published in the New England Journal of Medicine AI.1
Screening patients for clinical trials is an essential step in the trial process. The process is traditionally manual and relies on the diligence and judgement of study staff, which can be prone to human error and necessitates substantial time and resources.1
LLMs such as generative pretrained transformer 4 (GPT-4) have shown promise in optimizing medical applications. The investigators sought to investigate the application of GPT-4 within a specialized Retrieval-Augmented Generation (RAG)-based framework that enables the implementation of a clinical trial screening application in real-world scenarios.1
Calling their framework the RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review (RECTIFIER), the investigators assessed its efficacy in identifying eligible study participants, especially in scenarios where data may be unstructured, inaccurate, or incomplete.1
This framework was evaluated in the Co-Operative Program for Implementation of Optimal Therapy in Heart Failure (COPILIT-HF) study, which was designed to investigate the comparative effectiveness of 2 remote-care strategies to optimize guideline-directed therapy in patients with HF. To determine patient eligibility, trained study stuff manually reviewed electronic health records (EHR) and recorded their assessment of inclusion and exclusion criteria.1
Of the criteria used for the study, the investigators identified criteria that could not be reliably determined by using EHR data and used these to assess RECTIFIER’s ability to screen patients. To compare RECTIFIER to the study staff, a blinded expert clinician reviewed the patients and answered target criteria questions to establish “gold standard” answers.1
The investigators tested their 13 selected criteria through multiple phases of patients to tweak the prompts as necessary, and ultimately tested their prompts on a dataset of 1894 patients with an average of 120 notes per patient. The results were then compared to those of the study staff.1
The Matthews correlation coefficient was chosen for the final statistical analysis of performance in the test set due to being a robust metric with rare labels, according to the study authors.1
Based on alignment with the expert clinician’s “gold standard” responses, the AI process was 97.9% to 100% (MCC, 0.837 and 1) accurate, while the study staff assessing the same medical records were slightly less precise with accuracy rates between 91.7% and 100% (MCC, 0.644 and 1).2 RECTIFIER was observed to perform similarly for all target criteria except for “symptomatic heart failure,” for which the new framework performed better (97.9% versus 91.7% and an MCC of 0.924 versus 0.721, respectively).1
“We saw that large language models hold the potential to fundamentally improve clinical trial screening. Now the difficult work begins to determine how to integrate this capability into real-world trial workflows in a manner that simultaneously delivers improved effectiveness, safety, and equity,” Samuel Aronson, ALM, MA, a co-senior author of the study, said in a news release.2
Importantly, the investigators estimated that the RECTIFIER framework costs about $0.11 to screen each patient.2 Compared to approximately $34.75 per patient for the traditional screening model, the AI-enabled screening process was substantially cheaper and cost effective.1
Despite the effectiveness of the new framework and the cost-savings associated with it, the investigators noted the potential hazards that come with using an automated screening process. These include the loss of specific patient context, overlooked clinical details, and potential inequity.1
RECTIFIER’s application to the medical field is not limited to just clinical trials, according to the study authors. The framework could assist with addressing gaps in quality of care for diseases such as HF, provide guideline-directed medication use, and aid population health management.1
“If we can accelerate the clinical trial process, and make trials cheaper and more equitable without sacrificing safety, we can get drugs to patients faster and ensure they are helping a broad population,” Alexander Blood, MD, a co-senior author of the study, said in the news release.2