close
close
Using smaller LLMs for imaging leads to better sustainability

Smaller, “fine-tuned” large language models (LLMs) used for imaging applications are more sustainable than large general-purpose LLMs, consuming less energy without negative impact on accuracy, researchers reported.

A team led by Florence Doo, MD, of the University of Maryland Medical Intelligent Imaging (UM2ii) Center in Baltimore found that a small, specific LLM with seven billion parameters consumed 0.13 kilowatt hours (kWh), while a general LLM consumed 0.59—a difference of 78%. Their results were published August 27 in radiology.

“Radiologists can make a difference by choosing the ‘optimal’ AI model for a task – or as one mentor put it, you don’t need a sledgehammer for a nail,” said Doo AuntMinnie.com.

The energy consumption of LLMs for medical applications, including imaging, contributes to the overall carbon footprint of the healthcare system, according to Doo and colleagues. The size of an LLM is defined by the number of its “parameters”; these are “similar to the weighted neurons in the human brain,” Doo and colleagues explained, noting that “the size of an LLM relates to its complexity and learning ability, so more parameters mean that the model can potentially detect more sophisticated patterns in the data, which could lead to higher accuracy in tasks such as diagnosing disease from X-ray images.”

Because the energy consumption of LLMs has not been measured, Doo’s team investigated the balance between accuracy and energy consumption for different LLM types for medical imaging applications, particularly chest X-rays. They did so using a study that included data from five different billion (B) parameter sizes of open-source LLMs (Metas Llama 2 7B, 13B, and 70B, all general-purpose models, and LMSYS Orgs Vicuna v1.5 7B and 13B, which Doo’s group described as “specialized, fine-tuned models”). The study used information from 3,665 chest X-rays selected from the Indiana University National Library of Medicine X-ray collection.

The researchers tested the models using local “compute clusters” with graphics processing units for visual computing. A single-task prompt instructed each model to confirm the presence or absence of 13 CheXpert disease labels. (CheXpert is a large dataset of chest X-rays and a competition for automatic interpretation of chest X-rays developed in 2019 by Jeremy Irvin, a Stanford University graduate student, and his colleagues.) They measured the energy consumption of each of the LLMs in kilowatt-hours and assessed their accuracy against the 13 CheXpert disease labels for diagnostic findings on chest X-rays (overall accuracy was the mean of each label’s individual accuracy). The researchers also calculated the LLMs’ efficiency ratios (i.e., accuracy per kWh; higher values ​​mean higher efficiency).

They reported the following:

Comparison of LLMs regarding efficiency and accuracy of chest X-ray interpretation
Measure Lama 2 7B Lama 2 13B Lama 2 70B Vicuna 1.5 7B Vicuna 1.5 13B
Efficiency 13.39 40.9 22.3 737.2 331.4
General labeling accuracy 7.9% 74% 92.7% 93.8% 93%
GPU energy consumed (kilowatt hour or kWh) 0.59 1.81 4.16 0.13 0.28

The team highlighted that Vicuna 1.5 7B had the highest efficiency at 737.2, compared to 13.39 for the lowest-efficiency Llama 2, 7B. It also reported that the Llama 2 70B model consumed more than seven times as much energy as its 7B counterpart (4.16 kWh versus 0.59 kWh) and had lower overall accuracy compared to other models.

“We were surprised at how much more energy the larger models consumed, even though the accuracy was only slightly better,” said Doo.

According to Doo, bigger is not always better.

“We don’t always need the biggest, flashiest AI models to get great results,” she said AuntMinnie.com. “When selecting an LLM or other AI tools, we can consider sustainability and make smart decisions that benefit both our patients and the planet.”

You can find the complete study here.

By Olivia

Leave a Reply

Your email address will not be published. Required fields are marked *