Radiology Foundation Models: What Merlin, the 22% Hallucination Rate, and ED Fracture Data Tell Us

Radiology AI has been dominated by narrow task-specific models trained on single imaging modalities for single findings. Merlin, published in Nature Medicine in 2024 by researchers at Mass General Brigham and Harvard Medical School, is a 3D radiology foundation model trained on 110,000 CT volumes: learn general anatomical representations first, then apply them to any downstream task with far less labeled data than task-specific models require.

What Makes Merlin Different

Most radiology AI models operate on 2D slices. Merlin processes full 3D CT volumes at native resolution, learning anatomical relationships across axial, coronal, and sagittal planes simultaneously. The pretraining objective combines reconstruction of masked anatomical regions with contrastive learning between imaging and radiology report text. On downstream tasks, Merlin matched or exceeded task-specific models while requiring approximately 6x fewer labeled fine-tuning examples.

The Annotation Bottleneck

Expert radiology annotations are expensive: annotating a single CT volume for complex segmentation can take 30 to 90 minutes. Merlin-class foundation models directly address this by reducing labeled data requirements for new tasks.

What Foundation Models Still Cannot Do

Merlin benchmarks reflect performance on tasks included in the training distribution. The model has not been evaluated on rare pathology types, pediatric populations, or imaging protocols significantly different from Mass General Brigham standards. Distribution shift remains an unresolved problem for all radiology foundation models.

What Happens Next

The trajectory is toward multimodal foundation models processing CT, MRI, PET, and radiograph simultaneously. The regulatory pathway for foundation model-derived radiology tools under the FDA PCCP framework is an active area of policy development.

Primary source: Blankemeier L et al., Nature Medicine 2024.

Radiology Foundation Models: What Merlin, the 22% Hallucination Rate, and ED Fracture Data Tell Us

What Makes Merlin Different

The Annotation Bottleneck

What Foundation Models Still Cannot Do

What Happens Next

Like this:

More posts

LLMs in Veterinary Clinical Practice: What the Evidence Actually Shows

AI-Assisted Zoonotic Disease Detection: From SARS to H5N1

One Health and Machine Learning: How AI Bridges Human and Animal Disease Surveillance

Generative AI for Small Molecule Drug Discovery: How It Works and What the Evidence Shows

Radiology Foundation Models: What Merlin, the 22% Hallucination Rate, and ED Fracture Data Tell Us

What Makes Merlin Different

The Annotation Bottleneck

What Foundation Models Still Cannot Do

What Happens Next

Share this:

Like this:

More posts

LLMs in Veterinary Clinical Practice: What the Evidence Actually Shows

AI-Assisted Zoonotic Disease Detection: From SARS to H5N1

One Health and Machine Learning: How AI Bridges Human and Animal Disease Surveillance

Generative AI for Small Molecule Drug Discovery: How It Works and What the Evidence Shows

Discover more from My Written Word