
Clinical AI systems built on retrieval-augmented generation face a security threat that does not require compromising model weights. Poisoning the knowledge base redirects outputs at inference time without touching the model itself.
How RAG Poisoning Works
A RAG system retrieves documents from a knowledge base based on semantic similarity, then conditions the language model’s generation on those documents. An attacker who can insert documents into the knowledge base can craft content that retrieves reliably for target queries and steers the model output. The attack surface expands with every data source the RAG system ingests: clinical guidelines, drug databases, published literature, EHR notes.
Why Clinical RAG Is Particularly Exposed
Drug interaction databases update frequently. Published literature enters the knowledge base automatically. EHR notes are written by clinicians without security review. The 94.4% prompt injection success rate from JAMA Network Open (2024) applies to direct injection. Indirect injection through retrieved documents is harder to defend because the poisoned content does not appear in the user input.
The Defense Gap
Content provenance tracking, retrieval result filtering, and adversarial retrieval detection are not routinely deployed in clinical AI systems as of early 2026. The regulatory framework for clinical AI does not currently require adversarial testing of retrieval pipelines.
Related coverage: Prompt Injection Succeeds 94% of the Time Against Clinical LLMs | FDA Clearance for AI Medical Devices
Primary sources: Patel SB and Lam K, JAMA Network Open 2024. Zou et al., arXiv 2402.07927.