Google Cloud Next 2026: The Agent Infrastructure Stack Explained

Google Cloud Next 2026: The Agent Infrastructure Stack Explained
Google Cloud Next 2026: The Agent Infrastructure Stack Explained

Google’s biggest AI infrastructure announcements at Cloud Next 2026 on April 22 were not about new models. They were about the compute and orchestration layer that runs agents, and specifically about why existing infrastructure, designed for training and serving language models, is wrong for the new workload that agents create. Understanding what Google announced requires understanding what that workload actually looks like and why the architectures teams are using today will not scale to it.

The central problem Google described is that agentic AI creates a fundamentally different compute pattern than either model training or model serving. A single user intent, when processed by an agent system, decomposes into a chain of subtasks distributed across specialized agents that collaborate, maintain state between steps, use tools, and sometimes run for hours. This chain reaction, as Google’s infrastructure team described it, creates a compute topology where the primary model doing orchestration work is CPU-bound, while the specialized subagents doing inference work are GPU-bound, and the coordination layer between them has requirements that neither GPU clusters nor standard CPU instances were designed for.

The hardware Google announced for this specific workload is the Axion-powered N4A CPU instance family, combined with A2A protocol support natively built into the Agent Development Kit.

Why Agent Runtimes Need a Different Compute Layer

The distinction between model inference and agent runtime compute is not obvious until you look at what agents actually do between inference calls. An agent that orchestrates a multi-step workflow spends a significant fraction of its execution time not generating tokens. It parses tool call outputs, routes requests to the right subagent, evaluates partial results, handles errors and retries, maintains task state, enforces permission boundaries, and logs each action for the audit trail. This is logic, branching, state management, and I/O coordination. It runs on CPU, not GPU.

On standard GPU instances, this orchestration work runs as a sidecar process competing with inference for CPU time, or on the host CPU of a machine that is primarily optimized for the GPU workload it runs. Neither configuration is efficient. The GPU sits idle during the orchestration steps. The CPU is under-provisioned for the orchestration load. The result is latency bottlenecks and cost inefficiency that compound at scale.

Google’s argument for the N4A instances is that they offer the right balance for agent runtime workloads: enough CPU throughput to handle orchestration, tool dispatch, state management, and coordination at scale without paying for GPU capacity that those workloads do not use. The 30% better price-performance claim Google made for GKE Agent Sandbox on N4A versus competing agent workloads on other hyperscalers is specifically about this class of CPU-bound orchestration work, not about model inference. The inference still runs on GPU or TPU. The agent runtime runs on N4A.

This compute separation is the architectural pattern Google is pushing for production agent deployments: inference on accelerated hardware, orchestration on purpose-built CPU instances, with the A2A protocol handling coordination between agent components that may run on different hardware or even in different cloud regions.

GKE Agent Sandbox: The Execution Layer for Agent Code

The GKE Agent Sandbox is Google’s answer to the agent code execution problem that SmolVM, E2B, and OpenSandbox address from the open-source side. When an agent generates code that needs to run, or when an agent needs to execute tool calls in an isolated environment without affecting the host system, the GKE Agent Sandbox provides a managed execution container backed by gVisor isolation.

gVisor is an application kernel that intercepts system calls and re-implements them in a safe userspace process rather than passing them directly to the host kernel. This is weaker isolation than a hardware microVM boundary (as in Firecracker), but stronger than standard container isolation, because the guest process never makes direct kernel calls that could exploit host kernel vulnerabilities. The tradeoff is performance: gVisor adds syscall overhead compared to bare containers, but avoids the boot-time overhead of full microVM instantiation. For agent tool execution where individual operations are short and syscall volume is moderate, gVisor’s isolation profile is a reasonable balance.

The integration with N4A instances means the sandbox orchestration layer runs on CPU-optimized compute while heavy tool calls that require specialized hardware, such as those invoking TPU-backed models or GPU-accelerated inference, dispatch to the appropriate hardware class through the GKE scheduling layer. The agent runtime coordinates from N4A. The compute-intensive subtasks execute on the hardware class they require. Billing follows actual resource utilization rather than paying for GPU capacity across the full agent lifecycle.

A2A Native Support in ADK: What the Integration Means

The second major announcement for agent infrastructure at Cloud Next 2026 was A2A protocol support in Google’s Agent Development Kit. The A2A v1.0 specification, now governed by the Linux Foundation, defines how agents discover each other via Agent Cards, exchange tasks asynchronously, and communicate results through a typed message format. ADK’s native A2A support means developers using ADK can make their agents A2A-compliant with minimal additional code, and can discover and call other A2A-compatible agents regardless of which framework those agents were built on.

The specific capabilities ADK adds for A2A are agent registration, which publishes the agent’s Agent Card to a discovery registry; agent discovery, which allows the agent to query registries for agents with specific skills; task delegation, which creates A2A Tasks directed at remote agents and handles the full lifecycle including streaming updates and push notifications; and the Signed Agent Card verification introduced in A2A v1.0, which validates the cryptographic signature on received cards before establishing communication.

The practical consequence is that a multi-agent system built on ADK can include agents built on LangGraph, CrewAI, Microsoft Semantic Kernel, or any other A2A-compatible framework without custom integration code for each pairing. The agent communicates through the A2A protocol layer. The internal implementation is opaque. This is the interoperability goal that A2A’s design specifies: agents collaborate without needing to share internal memory, tools, or proprietary logic.

For organizations running agent workflows on Google Cloud infrastructure, the ADK-to-AgentCore integration provides a full-stack path from model inference on TPU infrastructure, through A2A-coordinated multi-agent collaboration on N4A CPU instances, to Agent Engine deployment that handles scaling, monitoring, and the governance layer that enterprise deployments require. Each component in that stack is now generally available or announced as generally available in the coming weeks.

The Tyson Foods and Gordon Food Service Case: A2A in Production Supply Chains

Google provided one concrete production deployment example at Cloud Next that illustrates what A2A coordination between organizations actually looks like. Tyson Foods and Gordon Food Service are using A2A to build collaborative agent systems for supply chain operations. The specific workflow: agents on the Tyson side share product data and leads with agents on the Gordon Food Service side to improve the sales process and reduce supply chain friction between the two companies.

This is a case where MCP alone cannot solve the coordination problem. Tyson’s agents and Gordon’s agents are built and operated by different organizations, on different infrastructure, possibly using different frameworks. They need to communicate without either party exposing their internal systems, data models, or proprietary logic to the other. A2A’s opacity principle, that agents collaborate without sharing internal state, is exactly the property this deployment requires. The agents exchange tasks and results through the A2A protocol. Neither organization’s internal architecture is visible to the other.

The Signed Agent Card mechanism in A2A v1.0 is relevant here: Tyson’s agents can verify that the Agent Card they receive from Gordon’s agents was actually issued by Gordon Food Service’s domain, not by an attacker who has intercepted the discovery request. This is the Signed Agent Card mechanism at work in a supply chain context rather than a financial services context.

AI Hypercomputer: The Infrastructure Layer Beneath the Agent Stack

The AI Hypercomputer is Google’s term for the full-stack infrastructure that runs both model training and serving, including the hardware, networking, and software components that make large-scale AI workloads possible. At Cloud Next 2026, Google announced expansions to the AI Hypercomputer portfolio relevant to production agent deployments.

The fourth-generation Compute Engine VM families powered by the latest Intel and AMD x86 instances fill the general-purpose CPU compute tier below the N4A Axion instances. For agent orchestration workloads that do not need Axion’s specific performance profile, these instances provide a cost-effective option. The announcement of NVIDIA-based infrastructure for workloads that require GPU compute at every step, including agents doing continuous model inference as part of their tool chain, rounds out the available compute tiers.

Thinking Machine Labs’ use of Google’s infrastructure to power Tinker, their open platform for reinforcement learning and fine-tuning of frontier models, achieving over 2x faster training on AI Hypercomputer, represents the performance category that Google is competing for at the infrastructure layer. Agent training, fine-tuning specialized agent components, and running RL-based optimization loops for agent behavior are compute workloads that the AI Hypercomputer is designed to handle at scale.

What Was Not Announced: The Gaps That Still Need to Close

Google’s Cloud Next agent infrastructure announcements are substantial. They are also incomplete in ways that matter for production deployments.

Agent observability is the most notable gap. The infrastructure handles compute, networking, scheduling, and protocol coordination. It does not yet provide the end-to-end visibility into agent behavior that Salt Security’s H1 2026 report found is absent for 48.9% of organizations. Knowing that an agent ran, how long it ran, and what resources it used is infrastructure-level telemetry. Knowing what the agent did, what decisions it made, what tool sequences it executed, and whether its behavior was within expected parameters is application-level telemetry that requires specific instrumentation. None of the Cloud Next announcements addressed this layer.

Agent identity and accountability standards are also absent from the infrastructure announcements. Google’s Agentspace provides governance controls for agents published to the Agentspace platform. Agents running directly on GKE Agent Sandbox or Agent Engine outside the Agentspace distribution channel do not automatically inherit those governance controls. The KYA Framework from MetaComp and Singapore’s IMDA governance standard address this layer from the regulatory side. Google’s infrastructure layer does not yet provide the identity registry, permission scoping, or behavioral monitoring that regulated enterprise deployments require.

The announced 30% price-performance advantage for GKE Agent Sandbox on N4A also needs independent validation. The claim is Google’s own benchmark, measured on Google’s own configuration. Production agent workloads vary significantly in their orchestration-to-inference ratio, tool call patterns, and state management requirements. Teams evaluating the N4A instances for agent runtime workloads should run their actual agent task profiles on N4A instances and compare directly to their current configuration rather than accepting the benchmark claim as representative of their specific case.

How This Connects to the Broader Agent Infrastructure Picture

Google Cloud Next 2026’s agent infrastructure announcements sit alongside OpenAI and AWS’s Stateful Runtime Environment and Amazon Bedrock AgentCore as the three major hyperscaler responses to the same infrastructure challenge: production-grade agent systems need compute infrastructure, protocol coordination, execution isolation, and state management that was not available as integrated platforms before 2026. All three hyperscalers have now announced these capabilities. The differentiation is in the details: compute architecture, pricing, protocol support, governance tooling, and how well each stack integrates with the organization’s existing cloud investment.

Teams building new agent infrastructure today face the first genuinely multi-vendor choice at the infrastructure layer since the early containerization era. The protocol layer has standardized around MCP for tools and A2A for agents. The compute and runtime layer is still differentiating. The decisions teams make in 2026 about which agent runtime infrastructure to build on will shape their vendor dependencies for years. The infrastructure announcements from Google, AWS, and Microsoft in the same four-week window signal that this decision window is open now and will close as teams commit to production architectures.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading