Researchers from the University of Bonn and the University Hospital Bonn (UKB) have demonstrated that local large language models (LLMs) can assist in structuring radiological findings in a way that protects patient privacy, with all data staying within the hospital walls. The study was published in the journal Radiology.
In medicine, organization is essential—not just in the operating room or office but also when it comes to data. Structured reports, for instance, are valuable for doctors and serve as a foundation for research databases. These structured datasets can later be used to train AI models for tasks like image-based diagnosis. However, many reports are still written in free-text form, making them harder to repurpose for these advanced applications. This is precisely where AI, and specifically large language models (LLMs), can play a significant role.
Open vs. Closed Models
LLMs are generally categorized into two types: closed-weight and open-weight models. Closed-weight models, like those used in commercial tools such as ChatGPT, are widely recognized but come with limitations. Open-weight models, such as Meta's Llama models, present a flexible alternative. These open models can run on local clinic servers and even be further trained to meet specific needs. One major advantage of open models is data security.
“The problem with commercial, closed models is that to use them, you have to transfer the data to external servers, which are often located outside the EU. This is not recommended for patient data,” emphasized Julian Luetkens, Professor and Comm. Director, Clinic for Diagnostic and Interventional Radiology, UKB.
But are all LLMs equally suitable for understanding and structuring the medical content of radiological reports? To find out which LLM is suitable for a clinic, we tested various open and closed models. We were also interested in whether open LLMs can be developed effectively on site in the clinic with just a few already structured reports.
Dr. Sebastian Nowak, Postdoc, Study First and Corresponding Author, Clinic for Diagnostic and Interventional Radiology, University Hospital Bonn
To address these questions, the research team conducted an analysis of 17 open LLMs and four closed models. Each model was tasked with analyzing thousands of radiology reports written in free-text format. The study utilized public, English-language radiology reports that did not require data protection, alongside German-language, data-protected reports sourced from UKB.
Training Makes the Difference
The results revealed that for reports without data protection requirements, closed models offered no clear advantage over certain open LLMs. When applied directly without additional training, larger open LLMs outperformed smaller ones. However, using pre-structured reports as training data significantly improved the information-processing capabilities of open LLMs, even when only a small number of manually prepared reports were available. Training also minimized the performance gap between larger and smaller open models.
In a training session with over 3,500 structured reports, there was no longer any relevant difference between the largest open LLM and a language model that was 1,200 times smaller. Overall, it can be concluded that open LLMs can keep up with closed ones and have the advantage of being able to be developed locally in a data protection-safe manner.
Dr. Sebastian Nowak, Postdoc, Study First and Corresponding Author, Clinic for Diagnostic and Interventional Radiology, University Hospital Bonn
This finding could enable the use of clinical databases for extensive epidemiological studies and advancements in diagnostic AI research.
Ultimately, this will benefit the patient, all while strictly observing data protection. We want to enable other clinics to use our research directly and have therefore published the code and methods for LLM use and training under an open license.
Dr. Sebastian Nowak, Postdoc, Study First and Corresponding Author, Clinic for Diagnostic and Interventional Radiology, University Hospital Bonn
Journal Reference:
Nowak, S., et al. (2024) Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports. doi.org/10.1148/radiol.240895