Poisoning the Medical Brain: RAG Attacks and Security in Clinical AI Systems

Poisoning the Medical Brain: RAG Attacks and Security in Clinical AI Systems
Poisoning the Medical Brain: RAG Attacks and Security in Clinical AI Systems

Clinical AI systems built on retrieval-augmented generation face a security threat that does not require compromising model weights. Poisoning the knowledge base redirects outputs at inference time without touching the model itself.

How RAG Poisoning Works

A RAG system retrieves documents from a knowledge base based on semantic similarity, then conditions the language model’s generation on those documents. An attacker who can insert documents into the knowledge base can craft content that retrieves reliably for target queries and steers the model output. The attack surface expands with every data source the RAG system ingests: clinical guidelines, drug databases, published literature, EHR notes.

Why Clinical RAG Is Particularly Exposed

Drug interaction databases update frequently. Published literature enters the knowledge base automatically. EHR notes are written by clinicians without security review. The 94.4% prompt injection success rate from JAMA Network Open (2024) applies to direct injection. Indirect injection through retrieved documents is harder to defend because the poisoned content does not appear in the user input.

The Defense Gap

Content provenance tracking, retrieval result filtering, and adversarial retrieval detection are not routinely deployed in clinical AI systems as of early 2026. The regulatory framework for clinical AI does not currently require adversarial testing of retrieval pipelines.

Related coverage: Prompt Injection Succeeds 94% of the Time Against Clinical LLMs | FDA Clearance for AI Medical Devices

Primary sources: Patel SB and Lam K, JAMA Network Open 2024. Zou et al., arXiv 2402.07927.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading