Anthropic Accidentally Leaked Its Most Powerful AI Model. The Model’s Draft Description Called It an Unprecedented Cybersecurity Risk.

AI Security Research — March 2026

Claude Mythos Capybara Leaked.
Cybersecurity Gets a Step Change.

Anthropic’s internal codename “Capybara” leaked in March 2026, revealing a specialized Claude variant tuned for cybersecurity research.

Internal Anthropic documents leaked in March 2026 revealed that the company has been developing a specialized model variant codenamed Capybara under its Mythos program, designed for cybersecurity research applications. The leaked materials, authenticated by Bloomberg and two other independent outlets, describe a Claude model tuned to assist with vulnerability analysis, attack surface mapping, and red team workflow support for U.S. government agencies and cleared defense contractors. Anthropic confirmed the program exists but declined to provide details on deployment scope or specific capabilities.

What Was Actually Exposed

The CMS misconfiguration exposed approximately 3,000 internal Anthropic assets including draft blog posts, internal documentation pages, and media files. The most significant was a draft announcement for a new model called Claude Mythos, which sits in a new capability tier called Capybara, above the existing Opus tier. The draft described Mythos as scoring “dramatically higher” than Opus 4.6 on coding benchmarks, academic reasoning tasks, and cybersecurity evaluations. The specific language that drew attention: the draft characterized Mythos as “currently far ahead of any other AI model in cyber capabilities” and warned of “unprecedented cybersecurity risks.”

Anthropic confirmed the leak was real within hours. The company stated that the CMS misconfiguration was identified and patched, that the exposed assets were draft materials not intended for publication, and that the Mythos model description was an internal working document. Anthropic did not deny the existence of the model or the capability claims.

What the Leaked Documents Describe

Offensive analysis (red team support): Per the leaked documents, Capybara assists with analyzing known CVEs and their exploitation pathways, mapping attack surfaces from provided network documentation, and generating proof-of-concept exploit outlines for patched vulnerabilities in controlled research environments. These capabilities are available only to users with verified government or cleared contractor credentials.

Defensive analysis (threat intelligence): The documents describe Capybara assisting with malware reverse engineering support, YARA rule generation, threat actor TTP analysis, and incident response triage.

Access control mechanism: The Mythos program uses a separate API endpoint with verified credential requirements. The capability limits are enforced at the model level through fine-tuning, not only through system prompts, making them more resistant to jailbreak attempts than policy-only restrictions.

Why the Cybersecurity Language Matters

An AI company describing its own model as an “unprecedented cybersecurity risk” in an internal document is remarkable. Anthropic’s brand is built on safety-first development. The Responsible Scaling Policy (RSP) commits Anthropic to pausing deployment if a model exceeds certain capability thresholds without adequate safeguards. The Mythos draft’s cybersecurity language suggests the model may be at or near an RSP threshold, which would trigger additional safety evaluations before deployment.

The practical implication: if Mythos’s cyber capabilities are as described, the model could autonomously discover software vulnerabilities, write exploit code, and potentially conduct offensive cyber operations with less human guidance than any previous model. This capability has dual-use implications. Defensive cybersecurity teams want AI that can find vulnerabilities before attackers do. Offensive actors want the same capability for the opposite purpose. The distinction between offensive and defensive use cannot be enforced at the model level because the underlying capability is identical in both cases.

Why This Represents a Step Change

AI-assisted cybersecurity tools have existed for years (Darktrace, Vectra, CrowdStrike’s AI features). What the Capybara documents describe is different: a general-purpose language model with frontier reasoning capabilities applied to cybersecurity-specific contexts, with the full breadth of Claude’s knowledge available alongside specialized security training. A human analyst using Capybara for malware analysis can ask follow-up questions, request explanations of specific code patterns, and iterate on hypotheses in natural language, a workflow that purpose-built security tools do not support.

Restricting offensive cybersecurity capabilities to verified government users is the right first step. It is not a complete solution. Credential verification systems can be compromised. Cleared contractors can misuse access. The model itself, once deployed on government infrastructure, creates a new attack surface: if an adversary can access Capybara through a compromised credential, they have a frontier AI assistant for offensive operations.

The Irony of the Leak Mechanism

The company that has built its entire reputation on AI safety and careful capability management leaked its most sensitive product information through a CMS misconfiguration, one of the most basic web security failures. That Anthropic, a company with world-class security researchers on staff, suffered this type of exposure is a reminder that organizational security is not determined by the sophistication of your AI models. It is determined by the hygiene of your infrastructure.

The competitive implications are significant. OpenAI and Google DeepMind now know Anthropic has a model that exceeds Opus 4.6 in development, with specific capability claims they can benchmark against. The leak eliminated Anthropic’s element of surprise for the Mythos launch. Competitors can now prepare responses, accelerate their own model releases, or preemptively position against Mythos’s claimed capabilities.

The Capybara leak puts Anthropic in an uncomfortable position: it describes itself as an AI safety company while developing specialized offensive cybersecurity capabilities for government clients. These positions are not necessarily contradictory, but explaining them requires more transparency than Anthropic has provided.

Sources: Leaked Anthropic Mythos program documents (authenticated by Bloomberg, TechCrunch, Wired, March 2026); Anthropic statement on Capybara; cybersecurity researcher analysis of leaked capability descriptions.

Anthropic Accidentally Leaked Its Most Powerful AI Model. The Model’s Draft Description Called It an Unprecedented Cybersecurity Risk.

What Was Actually Exposed

What the Leaked Documents Describe

Why the Cybersecurity Language Matters

Why This Represents a Step Change

The Irony of the Leak Mechanism

Like this:

More posts

Mistral Gave Away a Voice AI Model That Matches the $11 Billion Incumbent. Here Is How It Works.

Iran Hacked the FBI Director’s Personal Gmail. The Attack Was Not Sophisticated. That Is the Point.

A Microsoft VP Says He Hates the Mandatory Account Requirement. Here Is Why It Still Exists.

Shopify Made Every Store Shoppable Inside ChatGPT. Here Is How the Two Competing Protocols Actually Work.

Anthropic Accidentally Leaked Its Most Powerful AI Model. The Model’s Draft Description Called It an Unprecedented Cybersecurity Risk.

What Was Actually Exposed

What the Leaked Documents Describe

Why the Cybersecurity Language Matters

Why This Represents a Step Change

The Irony of the Leak Mechanism

Share this:

Like this:

More posts

Mistral Gave Away a Voice AI Model That Matches the $11 Billion Incumbent. Here Is How It Works.

Iran Hacked the FBI Director’s Personal Gmail. The Attack Was Not Sophisticated. That Is the Point.

A Microsoft VP Says He Hates the Mandatory Account Requirement. Here Is Why It Still Exists.

Shopify Made Every Store Shoppable Inside ChatGPT. Here Is How the Two Competing Protocols Actually Work.

Discover more from My Written Word