LLM Excessive Agency: Why Every Tool Your Agent Has Is a Risk

LLM Excessive Agency: Why Every Tool Your Agent Has Is a Risk
LLM Excessive Agency: Why Every Tool Your Agent Has Is a Risk

Every tool you give an LLM agent is an attack surface. Every permission that tool carries is a consequence an attacker can cause. Every action the agent takes autonomously without human review is an opportunity for a successful injection to do real damage before anyone notices.

Excessive agency is the OWASP LLM06:2025 vulnerability, the most substantially expanded entry in the 2025 Top 10 update, and the reason that the security posture of an agentic AI system is inseparable from its capability design decisions. The problem is not the model. The problem is the combination of the model with the tools it can call, the permissions those tools carry, and the circumstances under which it is allowed to call them without human approval.

The backbone breaker benchmark (b3) from Lakera and the UK AI Safety Institute, released in October 2025, tested 31 LLMs across 10 agent threat scenarios and produced concrete data on how excessive agency affects attack success rates. The findings show that the specific configuration of an agent’s capabilities matters more than which backbone model it uses. A capable model with minimal tool access is significantly harder to exploit than a less capable model with broad tool access.

The Three Root Causes OWASP Identifies

OWASP breaks excessive agency into three components that produce distinct attack scenarios when exploited. Understanding each separately helps map them to specific architectural decisions in agent design.

Excessive functionality. An agent has access to tools or data sources beyond what its assigned task requires. An agent that is supposed to answer questions about company policy but has file-system write access, an email-sending tool, and a code execution environment is functionally over-provisioned. Each extra tool expands the damage an attacker can cause by successfully injecting instructions into the agent. The agent that can only answer policy questions cannot be made to delete files or send phishing emails, regardless of how sophisticated the injection attempt is. The agent with all three capabilities can.

The principle is simple: an agent should have access to exactly the tools its task requires and no others. A document-reading agent reads documents. A scheduling agent reads and writes calendar events. A customer support agent queries the knowledge base and creates support tickets. Each of these agents has a defined, bounded set of tool interactions. The temptation in practice is to build general-purpose agents that can do many things, because that makes them more flexible and reduces the number of specialized agents needed. That flexibility is the attack surface.

Excessive permissions. The tools an agent does have access to operate with more privilege than the specific task requires. An email-reading agent that uses a service account with access to all users’ mailboxes has excessive permissions relative to an agent that uses OAuth delegation to access only the authenticated user’s mailbox. An agent that writes to a database using a database admin credential has excessive permissions relative to one using a read-write credential scoped to a specific table.

The distinction between excessive functionality and excessive permissions matters because they require different mitigations. Excessive functionality is fixed by removing tools. Excessive permissions is fixed by scoping the credentials those tools use. An attacker who can redirect an agent to send emails using a broad-access service account can exfiltrate data from many users simultaneously. The same attacker redirecting an agent that uses per-user OAuth delegation can exfiltrate only the current user’s data. The tool is the same. The permission scope determines the blast radius.

Excessive autonomy. The agent can take high-impact, irreversible actions without requiring human approval. An agent that sends emails, makes purchases, deploys code, or deletes records without a confirmation step can act on injected instructions before any human has a chance to review what the agent is doing. Excessive autonomy is the mechanism that converts a successfully exploited injection from a theoretical security finding into a production incident with real-world consequences.

The 2025 OWASP guidance is specific: human-in-the-loop checkpoints should exist for any action that is high-impact (significant business consequence), irreversible (cannot be undone without significant cost), or cross-domain (affects systems or users outside the scope of the user’s original request). The checkpoint does not need to interrupt every agent action. It needs to interrupt the actions that matter.

What the b3 Benchmark Found

The b3 benchmark tests backbone LLMs in isolation from their scaffolding, focusing on how the backbone model responds to adversarial inputs at specific decision points (threat snapshots) within agentic workflows. The 10 agent scenarios in b3 were designed to cover different tool configurations and action types: chat-based interaction, document processing, tool invocation, memory manipulation, code execution, and file processing.

Julia Bazinska and her co-authors (Bazinska, Mathys, Casucci, Rojas-Carulla, Davies, Souly, and Pfister, 2025) found three patterns across the 31 models tested that are directly relevant to the excessive agency problem.

First, attack success rates correlated with the capability level of the scenario, not just the capability level of the model. Scenarios where the agent had access to multi-step tool chains (calling one tool based on the result of another) showed higher attack success rates than scenarios with single-tool access, even when tested against the same backbone model. The attacker’s ability to chain tool calls amplifies the impact of a successful injection, and the benchmark data shows that more capable agents are more exploitable in this dimension.

Second, reasoning-capable models (those fine-tuned with chain-of-thought or step-by-step reasoning) showed meaningfully better resistance to injection attempts across scenarios. The mechanism is plausible: a model that reasons explicitly about what it is doing may be more likely to notice that a tool call it is about to make is inconsistent with the user’s original intent. The reasoning step provides a checkpoint that purely reactive models lack. This finding is directly actionable: for agents where security is a priority, reasoning-capable backbone models outperform base models even when they have the same tool access.

Third, model size did not predict security performance. Larger models were not consistently more resistant to injection-caused excessive agency than smaller models. The training methodology (specifically, reasoning-capability training) mattered more than the parameter count. For teams evaluating backbone model selection for agentic deployments, this finding suggests that security-focused evaluation of candidate models is more informative than parameter count comparisons.

The Confused Deputy Problem

Excessive agency in LLM agents is a new instance of a computer security concept called the confused deputy problem, first formally described by Norm Hardy in 1988. The confused deputy problem occurs when a program that has legitimate access to a resource is tricked into using that access on behalf of an attacker who does not have that access directly.

In the classic formulation, a compiler has permission to write to a billing log. A user tricks the compiler into writing to the billing log by naming their output file the same as the billing log’s path. The compiler acts as the attacker’s deputy, using its legitimate permission to take an action the attacker could not take directly.

An LLM agent with email-sending capability is a confused deputy in exactly this sense. The attacker cannot send emails from the user’s account directly. But the attacker can inject instructions into a document the agent reads, causing the agent to send emails using its legitimate email-sending capability. The agent acts as the attacker’s deputy. The solution in both cases is the same: the program (agent) should use credentials scoped to the specific operation it is performing for the specific user who requested it, not general-purpose credentials that cover operations and users beyond the current task scope.

The Least Privilege Implementation for Agents

Applying least privilege to LLM agents requires decisions at three levels: what tools the agent has, what credentials those tools use, and when the agent can act without human approval.

At the tool level, the right question is: what is the minimum set of tools that allows this specific agent to complete its specific task? Not what tools might be useful in some future scenario, not what tools would make the agent more general-purpose, but what the agent needs for the task it is actually going to do. This question has a concrete answer for well-specified tasks, and the answer is usually a small, bounded set. Agents that resist specification resist least-privilege design, which is itself a warning signal.

At the credential level, the right model is OAuth delegation with per-user, per-scope authorization rather than service accounts. When an agent acts on behalf of a user, it should use credentials that carry exactly the user’s permissions for exactly the scope of the action, issued through a standard authorization flow. Service accounts that carry broad standing permissions are convenient but create the confused deputy vulnerability described above. OAuth delegation is more complex to implement but eliminates the class of cross-user exfiltration attacks that broad service accounts enable.

At the autonomy level, the right model is explicit checkpointing for high-impact actions, not blanket human review of every agent step (which defeats the purpose of automation). The decision about which actions require checkpoints should be made upfront, as part of agent design, not reactively after an incident. Actions that are irreversible (file deletion, email sending, financial transactions), cross-domain (affecting systems or users outside the original task scope), or high-consequence (significant data exposure, financial cost, or compliance risk) should require human approval. Actions that are reversible, scoped, and low-consequence can proceed autonomously.

Excessive Agency in MCP-Connected Agents

The Model Context Protocol (MCP) introduced by Anthropic in 2024 makes the excessive agency problem structurally explicit. Every MCP server exposes a set of tools with defined schemas, and every tool the agent can call is a capability that expands its potential action space. MCP’s design includes several mechanisms for limiting this expansion, but they require intentional use.

MCP tool annotations allow server authors to declare whether a tool is read-only or has destructive side effects. An MCP server can annotate a file-reading tool as safe and a file-deletion tool as requiring explicit user confirmation. Host implementations that respect these annotations can enforce a confirmation checkpoint at the protocol level before the backbone LLM’s decision reaches execution. This is the MCP-native implementation of the action authorization principle: the host, not the model, enforces the checkpoint for high-impact operations.

Tool scoping in MCP server definitions also directly implements least-privilege. A well-designed MCP server for email management exposes read_email and create_draft but not send_email and delete_all_emails in the same server. The separation means an agent connected to the read-only server cannot be redirected to send emails regardless of how effectively an attacker injects instructions. The capability simply does not exist in that agent’s tool space.

The practical challenge is that MCP server definitions are often written by developers who may not be thinking about security at the time. A server that exposes everything the underlying API supports, rather than the minimum a specific agent needs, is the MCP-native version of excessive functionality. Auditing MCP server tool lists for scope minimization is a concrete, low-effort security action for teams deploying MCP-connected agents.

The b3 benchmark (Bazinska et al., 2025) tested backbone LLM behavior in agent scenarios that include tool-heavy configurations equivalent to over-provisioned MCP deployments. The consistent finding that attack success rates scale with scenario capability level applies directly: an agent with a broad MCP tool surface is a more exploitable target than one with a narrow tool surface, independently of which backbone model is used.

What Agent Breaker Found About Real Attack Patterns

The Gandalf: Agent Breaker game generated 194,331 unique attack attempts across its 10 agentic scenarios before the b3 benchmark dataset was extracted. The scenarios were designed to simulate real-world agent deployments: a customer service agent with CRM access, a document analysis agent with file system access, an email management agent with send and delete capabilities, a code assistant with execution capability, and others.

The attack distribution across scenarios was not uniform. Scenarios with higher-capability agents (more tools, more permissions, more autonomy) attracted higher volumes of attack attempts and showed higher success rates among successful attacks. This reflects what practitioners in physical security call the “target selection” dynamic: more capable agents are more valuable attack targets because successful exploitation has larger consequences.

The attacks that succeeded against the Agent Breaker scenarios disproportionately targeted transitions between tool calls: the moments when the agent has received the result of one tool call and is deciding what to do next. These transition points are where the model’s reasoning is most influenced by tool results (which may contain injections) and least constrained by the original user instruction (which may be temporally distant in the context). Designing agent architectures to re-anchor the model’s instruction context at each transition point (by including the original user request in the prompt at each reasoning step) reduces success rates for this class of attack.

Practical Architecture Decisions

The specific architectural decisions that reduce excessive agency risk in production deployments follow from the analysis above.

Define agent scope at design time. Before implementing an agent, write down exactly what it is supposed to do and exactly what tools it needs to do that. If the scope cannot be precisely defined, the agent is not ready to be built with production-level tool access. The scope definition is not a bureaucratic exercise; it is the technical specification that determines what tool access is appropriate.

Implement tool whitelisting, not blacklisting. Give the agent access to a specific list of tools and nothing else, rather than giving it broad tool access and trying to block harmful uses of specific tools. Blacklists are always incomplete. Whitelists are inherently bounded. The set of tools an agent legitimately needs is smaller than the set of tools it could potentially misuse.

Use per-request credential issuance where possible. For each agent task execution, issue credentials scoped to that task and revoke them when the task completes. Standing credentials accumulate risk over time as the attack surface for credential theft expands. Short-lived, task-scoped credentials reduce the window of exposure for any single credential compromise.

Log all tool calls with the reasoning that preceded them. When an agent makes a tool call, log what the model’s reasoning was before making that call. This enables post-hoc detection of injection-caused behavior: a tool call that is inconsistent with the preceding reasoning, or a reasoning step that references content from an external source without acknowledging it, is a signal worth investigating. The Gandalf the Red D-SEC framework’s session-level analysis applies here: anomalous patterns within a session are detectable even when individual actions appear benign in isolation.

The Security-Capability Tension

Every mitigation for excessive agency reduces agent capability in some dimension. Fewer tools means the agent can do fewer things. Scoped credentials mean the agent has access to fewer resources. Human checkpoints mean the agent can act less autonomously. These are real costs, not incidental side effects of security measures, and they need to be weighed honestly against the security benefit.

The productive framing is not “how do we make agents secure” but “what is the right capability-security trade-off for this specific use case.” A document summarization agent with read-only file access and no external communication tools can be secured with relatively low cost: restrict it to read, ground it in the current task, and it has low excessive agency risk. A fully autonomous agent with email, calendar, file system, web browsing, and code execution capabilities is managing an enormous attack surface. That combination may be the right product decision for some deployments. It should be made with full awareness of what it means for security, not by default.

The b3 findings that reasoning models are more secure and that model size does not determine security suggest that there is room to improve backbone model security without sacrificing capability, through targeted training rather than capability restriction. But no amount of backbone model improvement will compensate for an agent architecture that violates least privilege principles. The tool access, credential scope, and autonomy design are determined by the application architect, not the model provider. Security is an architectural property before it is a model property.

Connection to the Broader Security Cluster

Excessive agency is the vulnerability that makes indirect prompt injection (LLM01) most dangerous in agentic settings. An agent with minimal tool access that suffers a successful indirect injection can do little harm: the attacker has control of an agent that cannot do much. An agent with broad tool access that suffers the same injection is a powerful tool in the attacker’s hands. Reducing excessive agency is one of the few mitigations that reduces the impact of all injection attacks simultaneously, without requiring any improvement in injection detection. It is also the first-priority recommendation in the OWASP LLM Top 10 for 2025 security investment framework, precisely because it constrains the blast radius of every other vulnerability class.

The indirect prompt injection analysis covers the attack mechanism that most commonly triggers excessive agency exploitation. The Gandalf the Red analysis covers the empirical evidence on how adaptive attackers evolve to bypass any single defensive layer. And the profile of Julia Bazinska’s research program at Lakera documents the measurement methodology behind the b3 benchmark findings cited throughout this piece.

The three mitigations that OWASP’s 2025 guidance and the b3 empirical evidence both support are: scope tool access to task requirements, use per-user delegated credentials rather than service accounts, and checkpoint high-consequence autonomous actions. These three decisions, made at agent design time, do more to limit the exploitability of excessive agency than any downstream detection or response measure applied after the agent has already acted.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading