Claude Can Now Use Your Computer While You Sleep. Here Is the Architecture Behind It.

Claude Can Now Use Your Computer While You Sleep. Here Is the Architecture Behind It.
Claude Can Now Use Your Computer While You Sleep. Here Is the Architecture Behind It.

AI Agent Architecture — March 2026

Claude Can Now Open Apps,
Navigate Browsers, Fill Spreadsheets.

Anthropic shipped computer use for Claude Cowork in March 2026. The architecture separates the orchestration layer from the execution layer. Dispatch lets you assign tasks from your phone while Claude works on your desktop.

Anthropic launched computer use for Claude Cowork in March 2026, giving Claude the ability to open applications, navigate browsers, fill in spreadsheets, and interact with software interfaces on a user’s desktop. The launch came alongside Dispatch, a feature that lets users assign tasks to Claude from their phone while the desktop agent executes them independently. Both ship as part of Claude Cowork, available to Pro and Max subscribers on macOS first.

Anthropic was candid in its product notes: computer use is “still early compared to Claude’s ability to code or interact with text.” That distinction matters. The company is shipping a capability that is genuinely useful on simple, well-defined workflows while being transparent about where it fails.

The Two-Layer Architecture

Layer 1: Orchestration (Claude reasoning). Claude understands your goal, breaks it into steps, decides which app to open, what to click, what to type. This layer runs in Anthropic’s cloud.

Layer 2: Execution (OS control). A local agent on your Mac translates Claude’s instructions into actual OS actions: simulating mouse clicks, keyboard input, reading screen state via accessibility APIs. This layer runs locally.

Safety gate between layers. Before accessing a new application, Claude requests permission. The user approves. This creates a human-in-the-loop checkpoint for each new surface Claude touches.

What Dispatch Does

Dispatch is the mobile interface to Cowork. A user on their phone can describe a task and Dispatch routes the instruction to the desktop Cowork agent, which executes it. The conversation continues over the phone. The practical use case: long-running research and data tasks that take 20 to 40 minutes. A user starts the task during their commute, Claude works on the desktop while they travel, and the output is ready when they arrive.

How the Permission Architecture Actually Works

Anthropic’s computer use implementation runs through three layers. The first is the connector layer: Claude connects to your Mac via a local agent that handles screen capture, mouse movement, and keyboard input. The agent runs as a macOS accessibility service, which means the operating system’s standard permission model governs what Claude can access. Each application must be individually approved through System Preferences.

The second layer is the action model. Claude does not execute raw system commands. It operates through a vision-language loop: capture a screenshot, identify UI elements, decide which element to interact with, execute the interaction, capture the result, and repeat. This is fundamentally different from traditional automation (AppleScript, Automator, shell scripts) which operate on application APIs. Claude operates on pixels. The advantage is universality: any application with a visual interface can be controlled. The disadvantage is fragility: if a UI element moves, changes color, or renders differently, the action model can fail.

The third layer is Dispatch, the mobile trigger system. Users can initiate computer use tasks from their phone while away from their Mac. Dispatch queues the task, the local agent picks it up, Claude executes the workflow, and the result is available when the user returns.

Where It Fails and Why That Matters

Anthropic’s own documentation lists specific failure modes. Multi-monitor setups cause coordinate mapping errors. Applications with custom rendering engines (Electron apps with non-standard UI elements, games, CAD software) produce unreliable element identification. Dynamic content (streaming video, rapidly updating dashboards) creates timing mismatches between screenshot capture and action execution. Password prompts and two-factor authentication dialogs interrupt workflows with no automated recovery path.

The reliability data Anthropic has shared shows approximately 85% task completion on structured workflows (filling forms, copying data between applications, navigating web pages with standard UI). For unstructured tasks, completion drops significantly. The 85% figure is good enough for batch processing tasks where a 15% failure rate can be handled by human review. It is not good enough for mission-critical workflows where every failure has a cost.

How It Compares to OpenAI Operator and Google Mariner

The comparison to OpenAI’s Operator and Google’s Project Mariner is instructive. All three use vision-language models to interact with screen interfaces. None have solved the reliability problem for unstructured tasks. The differentiation is in the permission architecture (Anthropic’s per-app gates are more granular than Operator’s blanket session permissions) and the asynchronous execution model (Dispatch has no equivalent in competing products as of March 2026).

OpenAI’s Operator launched in January 2026 with browser-only computer use: it can navigate websites and fill forms but cannot interact with desktop applications. Google’s Project Mariner, announced at Google I/O, takes a similar browser-first approach through Chrome extensions. Anthropic’s Cowork is the only one that operates at the full desktop level, controlling native applications through the accessibility layer rather than limiting to browser tabs. This broader surface area creates both more capability and more failure modes.

The architectural difference that matters most is the interface inversion thesis. Traditional software automation requires APIs: if an application does not expose an API, you cannot automate it. Computer use inverts this by operating on the visual layer that was designed for humans. Every SaaS application, every desktop tool, every web portal becomes an API that Claude can call through its visual interface. The companies that built walled gardens with no API are suddenly accessible. The visual layer that was designed for humans becomes the programmatic layer that AI agents operate through.

For developers evaluating which computer use platform to build on, the decision comes down to scope versus reliability. Operator is narrower (browser only) but more reliable within its scope. Cowork is broader (full desktop) but less reliable on edge cases. Mariner is still in preview with limited availability. None of them are production-ready for unsupervised autonomous operation. The winner will be determined not by which approach sounds best in a demo but by which one fails least often in the unpredictable chaos of real desktop environments.

Sources: Anthropic official product announcements; Claude Cowork documentation; OpenAI Operator launch blog; Google Project Mariner announcement; Anthropic model card; March 2026.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading