OWASP LLM Top 10 for 2025: The Mechanism Behind Each Vulnerability

OWASP LLM Top 10 for 2025: The Mechanism Behind Each Vulnerability
OWASP LLM Top 10 for 2025: The Mechanism Behind Each Vulnerability

The OWASP Top 10 for Large Language Model Applications is the most widely referenced security framework for LLM deployments. Released initially in 2023 and updated for 2025, it documents the 10 most critical vulnerability classes that developers building with LLMs need to understand. The 2025 edition reflects a materially different understanding of the threat landscape than its predecessor: two new entries address RAG systems and system prompt architecture, several existing entries have been substantially reworked, and the ordering reflects real-world incident patterns rather than theoretical severity.

What most OWASP coverage misses is the mechanism. Knowing that prompt injection is ranked first is less useful than understanding why the attack is structurally unavoidable given current LLM architecture. Each item in the list has an architectural or engineering root cause that is worth understanding precisely. This piece covers all 10, with specific attention to what changed in 2025 and where empirical evidence now exists for the risks involved.

LLM01: Prompt Injection

Prompt injection holds the top spot for the second consecutive edition. The root cause is the absence of privilege separation in the LLM context window: developer instructions, user inputs, retrieved documents, and tool results all arrive as tokens in the same sequence, processed by the same attention mechanism with the same weights. The model has no hardware or software primitive that distinguishes a trusted instruction from an untrusted input. It follows instructions based on learned behavior, not enforced boundaries.

The 2025 edition distinguishes direct prompt injection (a user directly crafts an input to alter model behavior) from indirect prompt injection (an attacker embeds instructions in external content the model processes, such as documents, web pages, or tool results). Indirect injection is the more dangerous variant for agents because the attacker does not need any interaction with the LLM application at all: they need only influence data the application processes.

The Gandalf the Red ICML 2025 paper (Pfister, Volhejn, Knott, Bazinska et al., arXiv:2501.07927) provides the largest published empirical dataset for prompt injection defenses: 279,000 real attacks with outcome-based labels showing which attacks succeeded and against which defense configurations. The paper’s core finding is that adaptive attackers (those who refine based on model feedback) succeed at substantially higher rates than static attacker baselines, and that no current defense eliminates injection risk. The detailed mechanism and defense analysis for indirect injection covers the architectural root cause and current best-practice mitigations.

LLM02: Sensitive Information Disclosure

Sensitive Information Disclosure jumped from sixth place in the 2023 list to second in 2025, reflecting an increase in documented production incidents. The risk covers two distinct mechanisms that are often conflated.

The first is training data memorization. LLMs trained on large text corpora memorize fragments of their training data, including personally identifiable information, source code, proprietary documents, and credentials that appeared in training sources. Researchers have demonstrated that targeted queries can extract memorized content from frontier models (Carlini et al., “Extracting Training Data from Large Language Models,” USENIX Security 2021). The risk scales with training data size and model capacity: larger models memorize more, and more capable models are more susceptible to extraction through carefully crafted prompts.

The second mechanism is context disclosure: sensitive information present in the current context (system prompts, tool results, retrieved documents, conversation history) being leaked to users who should not have access to it. A RAG system that retrieves documents across access control boundaries and passes them all to the LLM may have the model synthesize an output that reveals content from a document the user was not authorized to read. The LLM is not performing access control. It summarizes what it was given. Passing it privileged documents means privileged information can appear in outputs.

The 2025 OWASP guidance is explicit: sensitive data should not be in the context unless the user is authorized to see it. The model cannot be relied upon to redact information it has been given. Access control must be enforced before retrieval, not after.

LLM03: Supply Chain

Supply chain vulnerabilities entered the LLM security picture because most production LLM applications do not train their own models. They use pre-trained models (from Hugging Face, model providers, or open-weight releases), fine-tune using third-party datasets, integrate with libraries that include model components, and use plugins and tools built by third parties. Each integration point is a potential supply chain attack vector.

The specific risks the 2025 edition highlights include model weight poisoning (a model published on a public repository with a backdoor or modified behavior), dataset poisoning (training or fine-tuning data that introduces malicious behavior), and plugin compromise (a third-party tool or API integration that returns malicious content or exfiltrates data). The “PoisonGPT” research demonstrated that ROME-based surgical editing of model weights can introduce targeted false beliefs that evade most capability evaluations while performing normally on most tasks. A model that appears healthy during testing may behave maliciously on a specific trigger.

The OWASP guidance recommends verifying model provenance, maintaining inventories of all third-party components, and treating model artifacts with the same rigor as third-party code libraries in traditional software supply chain security. Model cards on Hugging Face provide no cryptographic guarantees: a model card can be copied from a legitimate model and applied to a compromised one.

LLM04: Data and Model Poisoning

Data and Model Poisoning is conceptually adjacent to Supply Chain but focuses on the training pipeline rather than the distribution pipeline. The attack modifies training data or model weights to introduce behavior the attacker controls: a backdoor that activates on a specific trigger, a systematic bias toward certain outputs in certain contexts, or degraded performance on specific task types.

For fine-tuning specifically, poisoning attacks are particularly tractable. A relatively small number of poisoned examples in a fine-tuning dataset can introduce consistent behavior changes that persist across the fine-tuned model’s use. The attacker does not need to compromise the base model; they need to compromise only a small fraction of the fine-tuning data, which is often sourced from less-verified sources than pre-training data.

The connection to parameter-efficient fine-tuning techniques like LoRA and QLoRA is relevant here: the accessibility of fine-tuning that these methods provide also makes it easier for attackers to test and deploy poisoned fine-tunes. An open-source LoRA adapter published on a model hub can be malicious in ways that are difficult to detect without running the adapted model and evaluating its behavior on trigger inputs specifically.

LLM05: Improper Output Handling

Improper Output Handling is the LLM-specific instance of the classic injection vulnerability class: using LLM outputs in downstream systems without treating them as untrusted input. When an LLM output is passed to a SQL query without sanitization, the output could contain SQL injection payloads. When it is rendered as HTML without escaping, it could contain XSS payloads. When it is passed to a shell command as an argument, it could contain command injection.

The distinctive risk for LLMs is that the model may produce injection payloads not because a user asked for them, but because an indirect prompt injection (LLM01) instructed the model to do so. An attacker who successfully injects instructions via a document or tool result can direct the model to output SQL injection or XSS payloads that are then executed by the downstream system. The attack chain runs: poisoned input causes indirect injection, which causes the model to produce a malicious output, which is executed by a downstream system with the LLM application’s privileges.

OWASP’s 2025 guidance recommends treating all LLM output as untrusted user input for the purposes of downstream system security. The model is not a sanitization layer; it is a generation layer. Output encoding, parameterized queries, and input validation must be applied to model outputs exactly as they are applied to raw user inputs.

LLM06: Excessive Agency

Excessive Agency was the most substantially expanded entry in the 2025 edition, reflecting the growth of agentic AI deployments. The vulnerability arises when an LLM agent is granted more capabilities, permissions, or autonomy than its task requires. OWASP breaks the root cause into three components: excessive functionality (the agent can access tools it does not need), excessive permissions (the tools the agent does access operate with broader privileges than the task requires), and excessive autonomy (the agent can take high-impact actions without human approval).

The b3 benchmark (Bazinska, Mathys, Casucci, et al., 2025), released by Lakera and the UK AI Safety Institute, tested 31 LLMs across 10 agent threat scenarios and found that agents with broader tool access consistently showed higher attack success rates against injection attempts, because successful injections have more capabilities available to redirect. An agent that can only read documents cannot be made to send emails by a document-embedded injection. An agent that can both read documents and send emails is vulnerable to exactly this attack.

The principle of least privilege, applied to LLM agents, means scoping both the agent’s tool access and the permissions those tools use to the minimum required for the specific task. An agent that processes documents should not also have email-sending capabilities. An agent with email-sending capability should use the specific user’s delegated OAuth credentials, not a service account with access to all users’ mailboxes. The detailed treatment of Excessive Agency covers the specific attack patterns and architectural mitigations.

LLM07: System Prompt Leakage (New in 2025)

System Prompt Leakage is new to the 2025 edition and addresses a widespread architectural failure: developers treating system prompts as security controls and placing sensitive information in them (API keys, credentials, business logic, access control rules) on the assumption that the model will not reveal them.

OWASP’s 2025 guidance is direct on this: system prompts are not security controls. LLMs are stochastic systems. There is no mathematical guarantee that a given system prompt instruction will be followed in all adversarial contexts. Researchers have demonstrated systematic extraction of system prompt contents through direct questions, roleplay framings, and encoding tricks. The Gandalf the Red paper’s finding that system prompt-based defenses produce measurable utility penalties even when they do not block requests is the same observation in a different context: the system prompt changes model behavior globally and cannot be fully controlled.

The correct architectural principle is that secrets should never be in the system prompt. API keys and credentials belong in application code, retrieved from secrets management systems at execution time and passed to the relevant APIs without going through the model. Access control rules should be enforced in deterministic code, not LLM instructions. Business logic that must not be disclosed should be in the application logic, not the prompt. The system prompt should contain only information that could be disclosed without security consequence.

LLM08: Vector and Embedding Weaknesses (New in 2025)

Vector and Embedding Weaknesses is new to the 2025 edition, added in direct response to the rapid growth of RAG architectures. The vulnerability class covers four related attack surfaces in vector database systems.

Embedding poisoning involves injecting adversarial documents into a vector database that are retrieved in response to legitimate queries and contain indirect prompt injection payloads. The attacker does not need to compromise the vector database directly; they need only have a document ingested into it. When a user’s query retrieves the poisoned document, the agent processes its content and executes the injection.

Similarity attacks involve crafting queries that retrieve unintended content from the vector store by exploiting the embedding model’s similarity function. Documents that are semantically dissimilar to the attacker’s goal but numerically close in embedding space can be retrieved by queries specifically crafted to produce that retrieval.

Vector database access control failures arise when the same vector store is used across multiple users or tenant boundaries without enforcing per-user or per-tenant access controls on retrieval. A query from user A should not be able to retrieve documents owned by user B. Many RAG implementations implement this correctly for the database layer but not for the vector retrieval layer.

Embedding inversion attacks attempt to reconstruct the original text from its embedding vector. Research has shown that substantial information about the original text is recoverable from embedding vectors for many embedding models, which means stored embeddings may disclose sensitive content even if the original documents are not directly accessible. The MWW analysis of RAG poisoning in clinical systems covers the intersection of these vulnerabilities with healthcare data specifically.

LLM09: Misinformation

The 2023 list called this entry “Overreliance.” The 2025 renaming to “Misinformation” reflects a sharpening of the concern: the problem is not only that users over-rely on model outputs, but that the model generates and propagates false information confidently and fluently.

LLMs hallucinate: they produce outputs that are factually incorrect, internally consistent, and delivered with the same confidence as accurate outputs. The mechanism is that the model generates text that is statistically consistent with its training distribution. A confident-sounding wrong answer is statistically plausible. The model’s confidence does not correlate reliably with accuracy, and users have no signal from the output itself about which claims are accurate and which are confabulated.

For agentic systems, misinformation risk extends beyond user harm. An agent that uses LLM reasoning to make decisions may make decisions based on hallucinated facts. An agent processing a document may produce a summary that includes fabricated details not present in the source. An agent retrieving information to produce a report may cite non-existent sources that sound plausible. OWASP’s guidance recommends retrieval-augmented generation to ground outputs in verifiable sources, but RAG does not eliminate hallucination: it reduces it for claims that would be grounded by retrieved content and has no effect on claims for which no relevant document exists in the retrieval corpus.

LLM10: Unbounded Consumption

Unbounded Consumption replaces the 2023 “Denial of Service” entry with a broader framing that includes not only service disruption but economic harm and model theft. The 2025 scope covers three related threats.

Resource exhaustion through adversarial inputs remains: crafting inputs that cause the model to generate extremely long outputs, trigger expensive reasoning chains, or cause the application to make many recursive tool calls can exhaust compute budgets and degrade service availability for legitimate users.

Denial of wallet attacks target pay-per-use API billing. An attacker with access to an LLM application backed by a metered API can trigger large numbers of expensive queries, producing bills that are financially damaging to the application operator even if the queries do not disclose sensitive data. In multi-tenant cloud deployments, this can also reduce available capacity for other users by consuming shared compute resources.

Model extraction or theft attempts to reconstruct proprietary model weights or capabilities through black-box queries, building a model that replicates the target’s behavior without paying for training or licensing. Outputs from repeated targeted queries, systematically collected and used to fine-tune an open-weight model, can produce a reasonable approximation of the target model’s capabilities for specific task types.

What Changed From 2023 and Why It Matters

The two new 2025 entries (System Prompt Leakage, Vector and Embedding Weaknesses) directly reflect the dominant deployment architectures of 2024-2025. System prompts became the primary mechanism developers use to configure LLM applications, and the widespread misconception that they provide security guarantees made System Prompt Leakage a high-frequency real-world incident category. Vector databases and RAG architectures became the dominant approach for grounding LLM outputs in company knowledge, and the associated attack surfaces (embedding poisoning, access control failures) became production risks rather than theoretical concerns.

The jump of Sensitive Information Disclosure from sixth to second reflects real-world incidents rather than theoretical reassessment. Production LLM applications in healthcare, finance, and legal sectors disclosed confidential data through training data memorization and context leakage. The reordering reflects what actually happened in production, not what security researchers predicted.

The substantial expansion of Excessive Agency (LLM06) reflects the transition from chatbot deployments to agentic deployments. A chatbot that generates text has limited consequence if its outputs are manipulated. An agent that calls APIs, modifies databases, sends communications, and executes code has consequence proportional to its capabilities. As those capabilities expanded across the industry, the severity of excessive agency as a vulnerability class grew proportionally.

Cross-Vulnerability Interactions

Reading the OWASP list as 10 independent items misses the most dangerous attack patterns, which chain multiple vulnerabilities. The highest-impact attacks documented in production combine LLM01 (Prompt Injection) with LLM06 (Excessive Agency) and LLM05 (Improper Output Handling): an indirect injection via a processed document (LLM01) directs the agent to take an action using its excessive capabilities (LLM06), and the action produces output that is passed to a downstream system without sanitization (LLM05). The chain results in remote code execution or data exfiltration that no single vulnerability in isolation would enable.

Similarly, LLM07 (System Prompt Leakage) and LLM01 (Prompt Injection) interact: an attacker who extracts the system prompt (LLM07) through an injection attempt has a precise description of the application’s defenses, which they can then use to craft more effective injections. The system prompt often contains information about what the model has been told not to do, which is exactly the information an attacker needs to design circumventions.

For teams prioritizing security investment, the intersection of LLM01, LLM06, and LLM07 is where the highest-impact vulnerabilities concentrate. Reducing excessive agency (LLM06) limits the damage of all injection attacks simultaneously. Removing secrets from system prompts (LLM07) removes a reconnaissance capability from attackers. And implementing adaptive defenses as documented in the Gandalf the Red analysis addresses LLM01 at the session level rather than the prompt level, which is where adaptive attackers operate.

A Prioritization Framework: Where to Start

The OWASP list documents 10 vulnerability classes, but security investment is finite. For teams that need to sequence remediation, the empirical evidence from the Gandalf the Red paper, AgentDojo, and the b3 benchmark points to a consistent prioritization order based on blast radius and implementation cost.

First: LLM06 (Excessive Agency). Reducing what an agent can do autonomously is the single highest-ROI security investment for agentic deployments. It requires no model changes, no new tooling, and no ongoing maintenance. It limits the damage of every other vulnerability simultaneously: a successful prompt injection against an agent with minimal tool access can accomplish far less than the same injection against an over-provisioned agent. Implement least-privilege tool access, per-user OAuth delegation instead of service accounts, and human-in-the-loop checkpoints for high-impact actions. Do this before any other security investment.

Second: LLM07 (System Prompt Leakage). Removing secrets from system prompts is a zero-cost architectural decision that eliminates an entire reconnaissance capability from attackers. API keys, credentials, and access control logic should never be in the system prompt. This is a one-time architectural fix with no ongoing operational cost. The Gandalf the Red paper empirically confirmed that system prompts cannot be relied upon as security controls; the OWASP 2025 list formalizes this as a named vulnerability class.

Third: LLM01 (Prompt Injection): Session-Level Defenses. Prompt injection at the architectural level is not fully solvable, but the Gandalf the Red adaptive defense finding is directly actionable: build session-level detection (flagging users after a threshold of suspicious prompts within a session) rather than only per-turn detection. This is the highest empirically-supported ROI within the injection defense space, and it operates on top of whatever system prompt hardening and output checkers are already in place.

Fourth: LLM08 (Vector and Embedding Weaknesses) for RAG deployments. If the application uses retrieval-augmented generation, access controls on the vector store must be enforced before the application goes to production. Retrofitting per-user access controls into a multi-tenant RAG system after deployment is substantially more expensive than building them correctly from the start. The embedding poisoning and cross-tenant retrieval failure modes covered under LLM08 are structurally prevented by correct access control design at ingestion time.

LLM02, LLM03, LLM04, LLM05, LLM09, and LLM10 are real risks but generally require either model-level mitigations (LLM02, LLM03, LLM04) that are provider-dependent, or application-level output handling practices (LLM05) that should be part of standard secure development regardless of LLM involvement. They do not have the same immediate ROI as the four priorities above for teams making their first security investments in LLM applications.

The OWASP LLM Top 10 is available in full at genai.owasp.org and as a PDF at owasp.org. The 2025 PDF includes scenario-specific mitigation guidance for each entry and cross-references to related frameworks including MITRE ATLAS and NIST AI RMF. For teams building production LLM applications, it is the starting point for threat modeling, not the end point. The list documents what has gone wrong. The mechanism of each item is what determines which mitigations actually work.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading