SmolVM: Firecracker-Backed MicroVM Sandbox for AI Agent Code Execution

A developer at Celesto AI published a benchmark this week that should end a debate most AI teams are still having. On an AMD Ryzen 7 7800X3D running Ubuntu, SmolVM, a Firecracker-backed microVM sandbox built specifically for AI agent workflows, boots a fully isolated virtual machine in approximately 500 milliseconds. That is slower than a Docker container start by about 400ms. It is also the difference between an agent that can exfiltrate your AWS credentials through a prompt-injected shell command and one that cannot, because it is running in a hardware-isolated guest with its own kernel, its own network namespace, and a hypervisor boundary that a container escape CVE cannot cross.

SmolVM, published April 21, 2026 under the Apache 2.0 license by Aniket Maurya and the Celesto AI team, addresses a problem that has been growing quietly alongside every coding agent, app builder, and workflow automation tool deployed in the past two years. LLM-generated code is untrusted input. The tooling that most teams reach for when they need to execute it, Docker containers and Python subprocess calls, was not built with that threat model in mind. SmolVM was.

The tool has already drawn comparison to E2B, the hosted sandbox service, and to OpenSandbox, Alibaba’s open-source alternative. A head-to-head benchmark thread on r/LangChain in April 2026 put SmolVM first on five of six criteria. Understanding why requires understanding what Firecracker actually does and where Docker falls short.

Why Docker Is Wrong for AI-Generated Code

The security model of Linux containers rests on a single load-bearing assumption: the host kernel is trusted. A container is a Linux process with namespaces and cgroups layered around it. The process shares the host kernel’s system calls, memory management, and scheduler. When code runs inside a container, it is running on your kernel. If that code finds a privilege escalation path through the kernel, a container escape becomes a full host compromise.

This is not a theoretical concern. The runc runtime that underlies Docker and most container stacks receives CVEs at a steady pace. CVE-2019-5736 allowed a container to overwrite the runc binary on the host. CVE-2024-21626 was a working container escape. The attack surface is enormous precisely because the design deliberately shares kernel resources for performance. For packaging and deployment, that tradeoff is fine. For executing code that a language model generated based on potentially hostile input, it is the wrong tradeoff.

The right mental model, as Maurya puts it in the Celesto blog post announcing SmolVM, is to treat every piece of LLM-generated code as if it came from a random person on the internet. That mental model demands a different isolation primitive. Even a well-aligned model is susceptible to prompt injection through tool outputs, web pages fetched as context, or pasted document content. A prompt-injected command that reads ~/.aws/credentials and sends it to an attacker-controlled domain executes just as easily as a legitimate file operation. Inside a container, that command touches your host filesystem and your network. Inside a microVM, it touches a guest environment that is about to be thrown away.

What Firecracker Provides That Containers Do Not

Firecracker is a microVM manager developed by Amazon and now open source, originally built to power AWS Lambda and Fargate. Its design goal was to get VM boot times below one second while maintaining the security properties of full hardware virtualization. The key mechanism is a stripped-down VMM (virtual machine monitor) that exposes only the device emulation surface that workloads actually need, eliminating the decades of legacy device code in QEMU that provides most of the attack surface in traditional hypervisors.

What Firecracker gives you, and what Docker cannot give you, is a hardware virtualization boundary. Intel VT-x, AMD-V, and ARM virtualization extensions create a mode separation at the processor level between the host (VMX root mode) and the guest (VMX non-root mode). Code running in the guest cannot directly access host memory, devices, or kernel interfaces. A kernel exploit inside the guest reaches the guest kernel, not the host kernel. The blast radius of a container escape, which is full host compromise, compresses to a guest compromise where the guest is ephemeral and about to be destroyed.

SmolVM wraps Firecracker (on Linux) and QEMU (on macOS) in a Python API that hides the infrastructure complexity: TAP network devices, rootfs image management, vsock communication channels, and Firecracker’s REST control API. The developer-facing interface is three lines of Python. The VM starts, runs the command in a hardware-isolated guest, and destroys itself when the context manager exits. Nothing touches the host.

The Snapshot and Fork Pattern: Why This Matters for Production Agents

Boot time is where the Firecracker tradeoff gets interesting. At 500ms p50 on a modern workstation, creating a fresh VM per agent turn is technically feasible but adds latency that compounds in multi-turn workflows. SmolVM addresses this through a snapshot and fork pattern that reduces effective VM creation cost to near zero for repeated operations.

The pattern works in two stages. First, the developer creates a VM, installs dependencies, and configures the environment, then takes a snapshot of the VM’s full state: memory, filesystem, CPU registers. That snapshot is a frozen baseline. Second, for each agent turn or parallel agent run, the developer forks a new VM from the snapshot rather than booting from scratch. The fork operation restores the saved state in milliseconds, far below the 500ms cold boot time, because it is loading a memory snapshot rather than running a boot sequence.

The practical consequence for production agent deployments is significant. An agent that needs Python with pandas, numpy, and a set of domain-specific libraries installed can pre-warm a snapshot after environment setup. Every subsequent code execution turn forks from that snapshot: environment already configured, dependencies already installed, ready in under 100ms. For parallel agent runs, ten simultaneous forks from the same snapshot each get their own isolated VM without interfering with each other’s state. This is the feature that puts SmolVM ahead of alternatives on the r/LangChain comparison: E2B supports snapshots but lacks the fork/clone pattern; OpenSandbox supports strong isolation but the snapshot ergonomics require more setup.

Egress Filtering: Closing the Exfiltration Path

The second attack vector in AI-generated code, after local filesystem damage, is exfiltration. A prompt-injected command that reads ~/.ssh/id_rsa and posts it to a remote URL is a complete attack in two lines. Docker containers have full internet access by default. Restricting that access requires configuring iptables rules, network policies in Kubernetes, or third-party network plugins, none of which are trivial to maintain and all of which are shared infrastructure that other containers on the same host depend on.

SmolVM ships domain allowlisting as a built-in feature. The developer passes an allow_hosts list when creating the VM. The VM can only reach the specified domains. Every other outbound connection is blocked silently. The allowlist is per-VM. Every sandbox can have a different egress policy without touching shared infrastructure. A coding agent that only needs to call an API and install packages from PyPI can be restricted to exactly those two domains. The network enforcement happens at the VM’s network namespace level, not at the container networking layer, which means it cannot be bypassed by escaping the container.

Browser Agents and Repository Access

Two use cases beyond code execution make SmolVM particularly relevant for the current generation of agent products. The first is browser agents. Products like Claude in Chrome, Cursor’s web browsing features, and autonomous research agents all need to drive a browser session. Running Chrome on the developer’s host machine creates obvious problems: the browser runs as the user, has access to saved passwords and logged-in sessions, and any malicious redirect the agent follows has full access to the user’s browsing context.

SmolVM can start a full Chrome session inside a sandbox, with an exposed debugging port that Playwright or Puppeteer connects to from the host. The agent gets a real browser. The user’s host cookies, credentials, and extensions are never in scope. When the session ends, the VM is destroyed along with every trace of its browsing state.

The second use case is read-only repository access for coding agents. Cursor-style products that understand a codebase need to read project files. SmolVM supports read-only host directory mounts so the agent can explore the codebase without the ability to modify files. Even if a prompt injection redirects the agent to attempt a write, the mount permission denies it at the filesystem level, not at the application level.

SmolVM vs E2B vs OpenSandbox: What the Comparison Reveals

The r/LangChain benchmark that sparked community discussion in April 2026 compared SmolVM, E2B, OpenSandbox (Alibaba), and Microsandbox across six criteria. SmolVM led on five: snapshotting ergonomics, fork and clone support, pause and resume, computer-use support, and self-hosted deployment. E2B led on one: SDK ecosystem maturity, with the deepest support across Python, TypeScript, Go, and Ruby.

The distinction that matters most for team selection is hosted versus self-hosted. E2B is a managed cloud service. You pay per sandbox, get a polished SDK, and do not manage infrastructure. SmolVM runs on your own machines: a developer laptop, an on-premises GPU server, or a cloud VM you control. For teams in regulated industries, air-gapped environments, or with cost structures where per-sandbox billing becomes expensive at scale, self-hosted Firecracker is the only viable path. SmolVM is the shortest path to that option.

OpenSandbox from Alibaba is the most direct feature competitor. It supports gVisor, Kata Containers, and Firecracker backends, has multi-language SDKs, and is listed in the CNCF Landscape. Its MCP server integration is also noteworthy: opensandbox-mcp exposes sandbox creation and command execution to MCP-capable clients including Claude Code and Cursor. For teams that want the full isolation level of Firecracker without using a vendor’s own sandbox tool, OpenSandbox is the most credible alternative. The tradeoff is setup complexity: OpenSandbox’s Kubernetes runtime requires more infrastructure work than SmolVM’s single pip install smolvm path.

What SmolVM Does Not Cover

SmolVM is a runtime, not a complete agent security solution. The tool does not validate or sanitize code before executing it. It isolates the execution environment but does not inspect what the model is attempting to do before the attempt is made. Teams that need pre-execution analysis, behavioral anomaly detection at the model layer, or SIEM integration need to layer those controls on top of the sandbox.

The macOS support uses QEMU rather than Firecracker, because Firecracker requires Linux KVM. QEMU provides hardware virtualization on macOS via the Hypervisor Framework, but the performance characteristics and attack surface profile differ from the Linux Firecracker path. Teams relying on macOS for local development should test sandbox behavior on both platforms rather than assuming parity.

SmolVM’s SECURITY.md is explicit that sandbox network ports should not be exposed publicly without additional controls. The automatic trust of new sandboxes on first connection, noted in the documentation, is designed for local development only. Teams shipping SmolVM in production services need to review the trust model documentation rather than using the default configuration.

Why This Problem Is Larger Than Any One Tool

The agent sandbox question is not going away. The entire category of coding agents, app builders, and autonomous workflow tools is built on the premise that the model can write code that runs. That premise requires a security boundary that almost no team had thought about before 2024. HuggingFace’s smolagents documentation explicitly warns that its built-in LocalPythonExecutor is not a security boundary and recommends E2B, Modal, or Docker as alternatives, with the Docker caveat about container escape risk left largely unaddressed.

SmolVM adds a self-hosted Firecracker option to that list. The MCP client-side validation gap research showing 5 of 7 major MCP clients accept tool metadata without static analysis means prompt injection via tool outputs is a documented, unmitigated vector in most production agent deployments. An agent that receives injected instructions through a tool response and executes code based on those instructions is precisely the scenario SmolVM’s egress filtering and hardware isolation are designed to contain. The defense is not at the prompt layer. It is at the execution layer. A hypervisor boundary is harder to cross than a container namespace, and SmolVM is the first tool to make that boundary installable in one command.

Apache 2.0 license, self-hosted, sub-second boot times, and a Python API that hides Firecracker’s infrastructure complexity. For teams running agents that touch code execution today without hardware isolation, the question is not whether SmolVM is production-ready. The question is what they are running in its place.

SmolVM: Firecracker-Backed MicroVM Sandbox for AI Agent Code Execution

Why Docker Is Wrong for AI-Generated Code

What Firecracker Provides That Containers Do Not

The Snapshot and Fork Pattern: Why This Matters for Production Agents

Egress Filtering: Closing the Exfiltration Path

Browser Agents and Repository Access

SmolVM vs E2B vs OpenSandbox: What the Comparison Reveals

What SmolVM Does Not Cover

Why This Problem Is Larger Than Any One Tool

Like this:

More posts

How a Legacy Railway Endpoint Wiped PocketOS in Nine Seconds

Open-Weight LLM Rankings, April 2026: MMLU Is Saturated, Here’s What to Use Instead

ARC-AGI-3 Is Live. Here’s Why Current Models Score in the Low Double Digits.

ICLR 2026 Outstanding Papers: What They Actually Found, and the Review Crisis Around Them

SmolVM: Firecracker-Backed MicroVM Sandbox for AI Agent Code Execution

Why Docker Is Wrong for AI-Generated Code

What Firecracker Provides That Containers Do Not

The Snapshot and Fork Pattern: Why This Matters for Production Agents

Egress Filtering: Closing the Exfiltration Path

Browser Agents and Repository Access

SmolVM vs E2B vs OpenSandbox: What the Comparison Reveals

What SmolVM Does Not Cover

Why This Problem Is Larger Than Any One Tool

Share this:

Like this:

More posts

How a Legacy Railway Endpoint Wiped PocketOS in Nine Seconds

Open-Weight LLM Rankings, April 2026: MMLU Is Saturated, Here’s What to Use Instead

ARC-AGI-3 Is Live. Here’s Why Current Models Score in the Low Double Digits.

ICLR 2026 Outstanding Papers: What They Actually Found, and the Review Crisis Around Them

Discover more from My Written Word