Apple’s AI Reckoning: Why Siri Runs on Google’s Gemini Now

On January 12, 2026, Apple and Google issued a joint statement: Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google’s Gemini models and cloud technology. The deal is worth $1 billion annually to Google, with the total partnership valued at up to $5 billion over the contract term.

The technology story is that Siri is getting smarter. The strategic story is more significant. Apple, the company that built its own chips, its own operating system, its own developer tools, and its own privacy infrastructure, concluded that it could not build a competitive foundation model and should not try. It evaluated Anthropic as a potential partner. It chose Google instead. It then committed to publicly attributing this capability gap to a competitor.

That is a different company than Apple was two years ago.

What Apple Actually Agreed To

The partnership gives Apple complete access to Google’s Gemini model in Apple’s own data centers. Apple can use this access for distillation, a training technique in which a smaller model is trained to reproduce the behavior of a larger model by learning from its outputs and reasoning traces. According to reporting by The Information, Apple can ask the main Gemini model to perform tasks, collect both the answers and the reasoning process, and feed that data to smaller models trained to run on Apple devices.

The result is that Apple can build on-device models that behave like Gemini without Gemini running on device. The underlying reasoning capability comes from Gemini. The deployment infrastructure, privacy enforcement, and user interface come from Apple. Google’s servers handle the distillation training workloads. Apple’s A18 chips handle the production inference.

This architecture matters for privacy. Apple’s Private Cloud Compute system routes requests to server-side processing only when on-device models cannot handle them, and those requests go to Apple’s own servers running Apple-controlled software, not to Google’s servers. Gemini technically never processes end-user queries in production. It generates training signals. User queries stay within Apple’s infrastructure during live operation.

Whether this fully satisfies privacy-concerned users is a legitimate question. The distillation process involves Apple’s data, and the distilled models learn behavior from Gemini’s outputs. The architecture is more private than sending live user queries to Google. It is less private than a model trained entirely from first-party data on Apple-controlled infrastructure. These are different points on a spectrum, not a binary choice.

The Deployment Timeline and Current State

iOS 26.4, which shipped on March 23, 2026, did not include Gemini-powered Siri features. Internal testing revealed quality problems: Siri cut users off mid-sentence when speaking quickly, struggled with complex multi-step requests, and exhibited slow response times. Apple engineers were redirected to target iOS 26.5 in May and iOS 27 in September for the features originally planned for the March release.

The delay is significant. Apple announced this partnership in January and targeted a March release. Bloomberg’s Mark Gurman reported slippage in February. By March 23, the iOS 26.4 release notes listed Apple Music Playlist Playground and new emoji characters, but no Siri enhancements. The three-layer privacy architecture (on-device processing, Apple Private Cloud Compute, Gemini on Apple-controlled servers) creates technical complexity that a straightforward API integration would not. That complexity appears to be contributing to the latency and reliability problems observed in testing.

In independent testing during the iOS 26.4 beta period, Siri correctly handled 87% of multi-turn conversational tasks, up from 52% in iOS 25. Google Assistant leads at 91%. The gap is no longer a generation. It is a few percentage points, and Apple’s hardware integration advantage may overcome it in real-world usage where context from the screen and device state matters more than raw language model performance alone.

Apple’s Strategic Retreat

The AI assistant market in early 2026 has Google Assistant leading multi-turn task completion at 91%, Siri at 87% (with the Gemini-powered improvements), and Alexa at 73%. Siri is competitive. It is not leading. And the gap in foundation model capability that led Apple to this partnership traces to specific decisions made between 2018 and 2023.

Apple’s AI research during that period focused on on-device inference efficiency, privacy-preserving machine learning, and specific applications like image classification, speech recognition, and text prediction. The company was not running large-scale generative language model research at the level of Google DeepMind, OpenAI, or Anthropic. When the consumer demand for generative AI assistants became clear in 2023, Apple did not have a comparable internal research program. By the time the company understood the gap fully, the training data investment, compute requirements, and time required to close it were measured in billions of dollars and multiple years.

Apple made a rational decision: spend $5 billion on Google’s already-built capability rather than $10 billion or more building it from scratch, and use the saved capital and engineering resources to maintain hardware and software integration advantages where Apple is genuinely differentiated. The iPhone’s A18 chip, the M5 Mac chips, and the software stack built around them remain differentiated from anything competitors ship. Apple is betting those differentiators, combined with the privacy architecture, matter more to consumers than whether the AI on their device was trained by Apple or distilled from a Google model.

The Competitive Context

Samsung targeted 800 million devices for Gemini integration by 2025. Apple is now building on the same foundation model. The two largest smartphone vendors in the world are both betting their AI assistant strategies on Google’s underlying technology.

This concentrates strategic risk with Google DeepMind. If Gemini models exhibit systematic failures, safety issues, or capability regressions, both Apple and Samsung are directly affected. The risk is partially mitigated by the distillation architecture (both companies run distilled models rather than live Gemini) but not eliminated. Distilled models inherit their parent model’s learned behaviors and, potentially, its failure modes.

For Google, the partnership represents a revenue stream that does not require end users to choose Google products. Apple users who never search with Google are now running models distilled from Gemini’s outputs on iPhones. Google’s technology advantage propagates into every iOS device whether or not the user has a Google account.

Apple considered Anthropic as a potential partner before settling on Google. What differentiated Google was not only Gemini’s technical performance but also Google’s cloud infrastructure at scale, existing commercial relationships with Apple around Google Search in Safari, and the organizational stability of partnering with Alphabet rather than a four-year-old AI startup. The $1 billion annual fee is not purely a model licensing fee. It is a bundle price for model capability, cloud infrastructure access, distillation rights, and the implicit stability that comes with the partnership’s scale.

Limitations of the Privacy Architecture

Apple’s privacy claims for the Gemini integration depend on architectural properties that have not been independently verified at full production scale. The three-layer design works as described in Apple’s documentation. Whether it holds under the pressure of hundreds of millions of concurrent users, or whether edge cases expose user data to Google’s inference infrastructure, requires independent security research that has not yet been published.

The more substantive limitation is what privacy means in this context. Apple’s distillation approach means user queries in production do not go to Google’s servers and user data does not train Gemini. But the behavior Apple’s models exhibit reflects patterns learned from Gemini, which reflects patterns in Google’s training data. The privacy boundary is technically meaningful. It is not the same as Apple having built its own model with no Google input at any stage of the process.

What Happens Next

OpenAI’s strategic direction provides a useful contrast. OpenAI shut down the Sora consumer video product to focus on enterprise contracts ahead of its IPO (covered here: Why OpenAI Killed Sora). Apple is making the opposite bet. It is investing in a consumer AI product that integrates deeply with iPhone workflows because its revenue model does not require AI to monetize directly. AI needs to keep users buying iPhones and staying in the Apple ecosystem. That is a clearer path to return on investment than monetizing inference directly, and it does not require frontier model quality. It requires good enough model quality combined with exceptional hardware and software integration.

Two indicators to watch. First, whether the Gemini-powered features in iOS 26.5 demonstrate the quality Apple promised in January. If they deliver multi-turn accuracy and on-screen context awareness at the level Apple previewed, the delay was a quality control decision. If they ship with limited functionality, the integration challenges are deeper.

Second, whether Apple’s WWDC 2026 in June reveals further ambition for the Siri platform. Gurman has reported that iOS 27 will treat Siri more like a chatbot, with a dedicated app interface and more autonomous task execution. WWDC’s Siri announcements will signal how confident Apple is that the Gemini integration is producing the foundation it needs. A quiet WWDC on Siri would suggest the challenges are ongoing.

My Written Word

Qwen 3.5 9B Matches Models 13x Its Size: What Small Models Mean for Edge AI