MITRE ATLAS: The ATT&CK Framework for AI Systems

MITRE ATLAS: The ATT&CK Framework for AI Systems
MITRE ATLAS: The ATT&CK Framework for AI Systems

MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is the threat intelligence framework that gives security teams a shared vocabulary for talking about attacks on machine learning systems. Where the MITRE ATT&CK framework documents adversary tactics and techniques against traditional IT systems, ATLAS documents the analogous tactics and techniques against AI systems: how attackers reconnaissance ML pipelines, how they gain initial access to training data and models, how they install persistence through backdoors and poisoning, and how they exfiltrate model intellectual property or training data.

The framework matters because it converts academic security research into operational threat intelligence. A jailbreak paper or a backdoor attack paper documents a vulnerability. ATLAS classifies that vulnerability into a tactic-technique pair that security teams can reference in detection rules, incident response playbooks, and risk assessments. The classification is what allows an organization to say “we have controls in place for AML.T0018 (Backdoor ML Model)” rather than vaguely “we have controls against model poisoning.”

The Framework’s Structure

ATLAS adopts the same matrix structure as ATT&CK: tactics across the top (the adversary’s objectives) and techniques in columns (the methods adversaries use to accomplish each objective). The current version of ATLAS organizes attacks into 14 tactic categories that span the AI attack lifecycle from initial reconnaissance through final impact.

Reconnaissance covers the techniques adversaries use to gather information about target AI systems before launching attacks. This includes searching public model repositories (Hugging Face, GitHub) for organization-specific models, analyzing publicly available papers or blog posts that describe an organization’s AI architecture, and probing deployed AI APIs to characterize their behavior and identify vulnerabilities.

Resource Development covers techniques for acquiring or building the infrastructure adversaries need to mount attacks. This includes acquiring training datasets that match the target’s known training distribution, developing adversarial example generation infrastructure, and obtaining or training substitute models that can be used to craft black-box attacks.

Initial Access covers the techniques adversaries use to first gain access to the target AI system or its supporting infrastructure. ML Supply Chain Compromise (AML.T0010) covers the supply chain attacks documented in the LLM supply chain analysis: PoisonGPT-style weight editing, malicious adapter publishing, and compromised model marketplace entries.

ML Model Access (AML.T0044) covers the access required to mount inference-time attacks: API access, hosted model access, or full model weights access depending on the attack type. The required access level determines which subsequent techniques are available to the adversary.

The Core Technical Techniques

Several ATLAS techniques are particularly relevant to LLM security and map directly to the attack analyses in this cluster.

AML.T0018 (Backdoor ML Model) covers neural backdoor attacks as documented in the neural backdoor attack analysis: training poisoning, weight poisoning, and post-training backdoor insertion. ATLAS classifies this technique under the Persistence tactic, reflecting the accurate characterization that backdoors persist in the model after deployment and survive standard evaluation procedures.

AML.T0020 (Poison Training Data) covers training data poisoning attacks that do not necessarily install backdoors but degrade model performance, bias model outputs, or install vulnerabilities. The distinction from AML.T0018 is that AML.T0018 specifies the backdoor pattern (normal behavior except under trigger), while AML.T0020 covers the broader category of training data manipulation including bias injection and capability degradation.

AML.T0043 (Craft Adversarial Data) covers inference-time evasion attacks: prompt injection, jailbreaks, adversarial examples, and the broader category of carefully crafted inputs designed to elicit unintended model behavior. This technique encompasses the IPI attacks documented in the IPI mechanism analysis and the jailbreak techniques distinguished in the jailbreaking vs prompt injection analysis.

AML.T0048 (External Harms) covers attacks that use AI systems to cause harm to entities outside the AI system itself: using a compromised model to send phishing emails, generate disinformation, write malicious code, or take actions on external services. This is the impact tactic for LLM agent compromise: the harm is not just to the model but to the systems and users the model interacts with.

How ATLAS Differs from OWASP LLM Top 10

ATLAS and OWASP LLM Top 10 cover overlapping subject matter but serve different operational purposes. OWASP’s framing is vulnerability-class oriented: it identifies categories of weaknesses that LLM applications have (Prompt Injection, Sensitive Information Disclosure, Supply Chain, Excessive Agency) and provides guidance for application developers to design defenses against each class.

ATLAS’s framing is adversary-tactic oriented: it identifies what attackers do (Reconnaissance, Initial Access, Persistence, Impact) and provides guidance for security teams to detect and respond to adversary behavior at each stage of the attack lifecycle. The OWASP framing supports secure development; the ATLAS framing supports security operations.

The two frameworks crosswalk effectively. OWASP LLM01 (Prompt Injection) maps to ATLAS AML.T0043 (Craft Adversarial Data). OWASP LLM03 (Supply Chain) maps to ATLAS AML.T0010 (ML Supply Chain Compromise) and AML.T0018 (Backdoor ML Model). OWASP LLM06 (Excessive Agency) does not have a direct ATLAS counterpart because it is a vulnerability class (the system has more capability than necessary) rather than an adversary technique, but it shapes the impact severity of AML.T0048 (External Harms) when other techniques succeed against an agent system.

Case Studies in ATLAS

ATLAS maintains a case studies section that documents real-world AI security incidents and maps them to the framework’s tactics and techniques. The case studies are the empirical grounding for the technique definitions: each documented incident validates that the technique has been used in practice against real systems.

Documented case studies include attacks against facial recognition systems (Tay 2016, where Microsoft’s chatbot was manipulated through user interactions to produce inappropriate content), attacks against autonomous vehicle vision systems (physical adversarial examples that caused traffic sign misclassification), and attacks against healthcare ML systems (adversarial perturbations to medical imaging that caused diagnostic errors).

The case studies are particularly valuable for organizations performing threat modeling because they establish concrete attack scenarios that have actually occurred. A risk assessment that includes “adversarial examples against image classifiers” is more defensible when it can cite specific documented case studies from ATLAS rather than relying on academic vulnerability papers that may or may not have been demonstrated against production systems.

NIST AI RMF Crosswalk

The NIST AI Risk Management Framework (AI RMF), published in 2023 and updated through 2024, provides a higher-level risk management taxonomy that complements ATLAS’s technique-level detail. Where ATLAS catalogs adversary techniques, NIST AI RMF organizes the risk management functions (Govern, Map, Measure, Manage) that organizations should implement to address those techniques.

The crosswalk between the two frameworks operates at the level of risk categories: NIST AI RMF identifies risk categories (such as “adversarial manipulation” or “data integrity compromise”), and ATLAS provides the specific techniques that constitute those risk categories. An organization using both frameworks treats NIST AI RMF as the governance and risk management structure and ATLAS as the technical threat catalog within that structure.

This separation of governance from threat catalog is methodologically important. NIST AI RMF describes what organizations should do (govern, measure, manage risk). ATLAS describes what adversaries actually do (specific techniques). Conflating the two leads to risk frameworks that prescribe controls without grounding in adversary behavior, or threat catalogs that document techniques without operational risk management.

Operational Use of ATLAS

For security teams adopting ATLAS in production, the framework supports several operational use cases.

Detection rule mapping: each ATLAS technique can be mapped to specific detection rules in the security operations center (SOC). For AML.T0043 (Craft Adversarial Data), detection rules might include classifier-based detection of prompt injection patterns, anomaly detection on user query patterns, or alerts on system prompt extraction attempts. The mapping creates a structured coverage analysis: for each technique, what detection coverage does the organization have?

Incident classification: when an AI security incident occurs, classifying it using ATLAS technique IDs creates structured threat intelligence that can be shared across organizations and tracked over time. An incident classified as AML.T0018 + AML.T0048 (backdoor model with external harm) is immediately understood by any organization familiar with the framework, without requiring a detailed narrative explanation.

Threat modeling: when designing new AI systems, walking through the ATLAS framework provides a structured threat modeling approach. For each tactic in the framework, the design team can ask: what techniques in this tactic apply to our system, what controls do we have in place, and what is the residual risk? This produces threat models that are exhaustive across the documented attack surface rather than ad hoc.

Red team planning: red teams use ATLAS to structure their testing campaigns. Rather than running unstructured red-teaming exercises, ATLAS-aligned red teams target specific techniques and document their results in terms of which techniques succeeded and which failed against the target system. The structured output supports comparison across red-teaming engagements and tracking of defensive posture improvements over time.

Limitations and Active Development

ATLAS has limitations that practitioners should understand. The framework is incomplete: new attack techniques are continuously being developed in academic research, and ATLAS lags by months to years in incorporating them. Some attacks that appear regularly in academic literature have not yet been incorporated into ATLAS techniques, which means the framework provides less coverage of bleeding-edge attacks than it does of established attack categories.

The framework is also primarily descriptive rather than prescriptive. ATLAS catalogs what adversaries do, but does not prescribe specific defensive controls. Organizations using ATLAS need to map techniques to controls themselves, drawing on resources like the OWASP LLM Top 10 (which provides defensive guidance) and security tool vendor documentation (which provides specific detection and prevention capabilities).

The technique IDs themselves are subject to revision as the framework evolves. Organizations that have integrated ATLAS IDs into their detection rules and risk assessments need processes to track ATLAS framework updates and revise their integrations accordingly. This is the same versioning challenge that ATT&CK adopters have managed for years, and the same processes apply: track the framework’s release schedule, allocate time for periodic revision of integrations, and prioritize updates based on which techniques the organization considers most relevant.

The Broader Value of the Framework

The value of ATLAS, beyond its specific technical content, is the shared vocabulary it creates for AI security across organizations. Before ATLAS, security teams discussing adversarial machine learning had to define their terms from scratch in each conversation: what does “model poisoning” mean, what counts as “prompt injection,” how is “data integrity compromise” distinguished from “data poisoning.” After ATLAS, the discussion happens in terms of specific technique IDs with documented definitions, examples, and case studies. The communication overhead drops, and the precision of risk analysis improves.

This shared vocabulary effect is what made ATT&CK successful for traditional IT security. Before ATT&CK, vendors and security teams used different terms for the same techniques, and risk analyses were difficult to compare across organizations. After ATT&CK, a technique like T1059 (Command and Scripting Interpreter) means the same thing in detection rules, threat reports, and risk assessments across the industry. ATLAS aims to provide the same convergence for AI security, with the same potential for accelerating defensive maturity across the industry.

For organizations adopting ATLAS, the appropriate starting point is mapping the techniques in the framework to their existing AI deployment portfolio: which techniques apply to each deployed model, what controls are in place for each technique, and what gaps exist in coverage. This mapping is the foundation for ongoing security operations and risk management aligned with the framework. The companion frameworks (OWASP LLM Top 10, NIST AI RMF, ISO/IEC 23894) provide complementary perspectives that, together with ATLAS, give security teams the full vocabulary they need to discuss AI security with the same precision and operational discipline that traditional cybersecurity has developed over decades. For the application-level vulnerability taxonomy, see the OWASP LLM Top 10 for 2025 analysis; for the empirical testing methodology that operationalizes ATLAS techniques in production red-teaming, see the red-teaming methodology.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading