Apple Is Paying Google $1 Billion a Year to Run a Custom 1.2 Trillion Parameter Gemini on Servers Google Cannot Watch

Apple is paying Google roughly $1 billion a year to license a 1.2 trillion parameter Gemini model built specifically for Siri. The model does not run on Google Cloud. It runs on Apple-designed servers inside Private Cloud Compute, the hardened enclave architecture Apple shipped in 2024 for Apple Intelligence. Google provides the weights. Google cannot see what happens to them after they arrive.

That is the architectural fact every outlet covering the January 12 deal missed. The business story is that Apple gave up on its own frontier model. The mechanism story is stranger. Apple picked the largest model it could get, built a custom variant with Google’s help, and placed it inside an environment that denies Google, Apple’s own staff, and any third party the ability to observe inference. The security paper backing this is public. The GitHub repository is public. Almost nobody reading about the deal has looked at either.

This article walks through every layer of the stack. The custom 1.2T Gemini at the top, the knowledge distillation that produces on-device student models, and Private Cloud Compute’s five enforceable guarantees underneath. It ends with the honest limits of the privacy story and what Apple still has to build.

What Apple actually licensed

Apple’s statement on January 12 was that Google’s technology provides the most capable foundation for Apple Foundation Models. Tim Cook clarified the scope on the Q1 2026 earnings call two weeks later. He called it a collaboration and said the personalized version of Siri relies on Google. Cook also said Apple will continue independent in-house work, but the specific product being shipped this year is Google-powered.

Mark Gurman at Bloomberg reported the financial terms in November 2025 and confirmed them in January: approximately $1 billion per year. The model itself is a 1.2 trillion parameter custom Gemini variant, eight times the size of the 150 billion parameter cloud model Apple currently runs for Apple Intelligence. Google built it for Apple, optimized for the two workloads Siri handles most: summarization and planning. Apple references this internally as Apple Foundation Models v10. A second version, v11, is planned for iOS 27 in September 2026.

The model did not come with a Google Cloud deployment. Apple got the weights. That is the crucial detail.

The first Gemini-powered Siri features were supposed to ship in iOS 26.4 in March 2026. Engineering pushed that to iOS 26.5 in May. The full conversational assistant, internally codenamed Project Campos, is targeted for iOS 27 in September 2026. Apple’s own Baltra AI server chip enters mass production in H2 2026 with dedicated data centers in 2027.

The distillation layer

Apple confirmed on March 25, 2026 that it can now distill Google’s full 1.2T Gemini into smaller on-device student models that run without an internet connection. This matters more than the cloud deployment, because it determines which queries leave the device at all.

Knowledge distillation is a training procedure, not a compression one. A large teacher model produces soft probability distributions over its output vocabulary for a training corpus. A smaller student model is then trained to match those distributions instead of the hard labels. The student inherits the teacher’s calibrated uncertainty, its reasoning patterns, and a substantial share of its benchmark performance at a fraction of the parameter count. Geoffrey Hinton’s 2015 paper on the method is still the standard reference. DeepMind used distillation to make Gemini Nano small enough for Pixel phones.

Apple is running the same procedure in reverse of the usual direction. Most distillation flows from a lab’s own large model to the lab’s own small model. Here, Apple takes Google’s 1.2T teacher, supervises the distillation in Apple’s own infrastructure, and produces students tuned for Apple Silicon’s Neural Engine. The supervised part is what Apple gets for its billion dollars. Without it, Apple would be locked into whatever Gemini variant Google chose to compress. With it, Apple chooses the student architecture, the training data mixture, and the hardware target.

The on-device student handles the cases that dominate Siri’s query volume: short summaries, intent classification, on-screen reference resolution, cross-app routing. Everything it cannot handle is forwarded to the teacher. The teacher is the 1.2T parameter Gemini. That teacher is what runs inside Private Cloud Compute.

Private Cloud Compute: five enforceable guarantees

Private Cloud Compute is the name of the server fleet, the hardened operating system running on each node, the attestation system that lets user devices verify which software is running, and the public transparency log that commits each release to the record. Apple published the security design in June 2024 and released the Virtual Research Environment and a subset of the source code in October 2024 under the apple/security-pcc GitHub repository. The architecture is built around five requirements Apple committed to making technically enforceable, not merely policy.

Stateless computation. User data arrives, is used only to fulfill the request, and is deleted before any response leaves the node. PCC nodes have writes-to-storage physically removed from the runtime. There is no general-purpose logging mechanism. Only pre-specified, structured, audited logs can exit a node, and multiple review layers vet what the allowlist contains. No persistent debug trace, no SIEM hook, no incident-response breadcrumb survives the request.

Enforceable guarantees. The security claims hold because the components that implement them are small, auditable, and signed. Apple publishes the list. Third-party researchers verify it. There is no trust-us surface.

No privileged runtime access. PCC nodes do not run SSH. They cannot enable Developer Mode. The debug tooling a normal datacenter operator needs is absent at the binary level. Code Signing blocks new binaries from loading. Apple’s own staff cannot log into a running node during processing, even under the duress of a severe incident. The architectural choice here is that operability loses to privacy. If a node misbehaves, Apple reboots it and takes the loss of forensic detail.

Non-targetability. An attacker who compromises one PCC node cannot steer a specific user’s traffic to it. Requests carry no user identifiers. Routing is randomized and attested. Even with a full node compromise, the privacy exposure scales with the attacker’s ability to compromise the whole fleet, not with the value of a single target. This is the requirement that prevents a Gemini-powered Siri from becoming a selective surveillance vector.

Verifiable transparency. Every PCC software release is committed to a public append-only log, modeled on the Key Transparency log used for iMessage Contact Key Verification. A user’s device refuses to send a request to any PCC node that cannot cryptographically attest to running a build on the log. Once a release is logged, it cannot be retracted without detection. Researchers can download the Virtual Research Environment, inspect the image, and test whether the binary matches the published source. This is what makes the security claim auditable by outsiders.

The three exposed components that implement these guarantees all have names. CloudAttestation builds and verifies the node attestations. Thimble runs the privatecloudcomputed daemon on a user’s device, which enforces verifiable transparency before transmitting data. splunkloggingd filters the narrow set of logs allowed to leave a node. All three are in the public repository.

What this means for Google

Google trained the 1.2T Gemini on Google hardware. Apple received the weights. From that handoff forward, Google has no telemetry, no metrics, no usage breakdown, no crash reports, no error logs, and no gradient of the users sending which prompts. The Private Cloud Compute architecture denies Google any ability to observe inference. Whatever feedback loop Google runs on its other enterprise deployments of Gemini does not exist here.

This is a strange position for a frontier model provider. Google has optimized Gemini to inference workloads Google watches. Improvements come from logs, from RLHF on real usage, from drift detection in production. Apple is paying for the weights and denying Google the data that would improve them. The implication is that the Apple-variant Gemini will slowly diverge from the Google-variant Gemini. Apple’s internal post-training can re-tune. Google’s external post-training cannot reach into a PCC node.

For Apple, this is the point. For Google, it is the cost of being in the deal at all.

The limits of the privacy story

The architecture is the strongest cloud AI privacy design deployed at scale. It is not airtight.

Apple owns the attestation keys. The root of trust for PCC is hardware Apple designed, signed, and manufactures. A security researcher at Mithril noted in 2024 that this is a meaningful gap versus a full confidential-computing design where a separate silicon vendor owns the attestation report. If Apple is ever compelled to sign a PCC image containing a backdoor, the transparency log records it, but the log exists only because Apple publishes to it. The trust decomposition is stronger than anything else on the market and still not independent.

There is no enterprise observability layer. IT teams running PCC-dependent workflows cannot trace usage, apply conditional access, or feed events into SIEM. This is a known gap for regulated environments.

The ChatGPT integration Apple announced in 2024 still exists. When Siri delegates a complex query to ChatGPT, the data leaves the PCC trust boundary entirely. Apple told CNBC in January 2026 that this arrangement is unchanged. A user asking Siri an ordinary question is covered by PCC. A user asking Siri a question that routes to ChatGPT is covered by OpenAI’s data-handling terms.

And iOS 27 brings Siri Extensions that will allow third-party AI apps to receive Siri queries directly. Once a query crosses into a Gemini app, a Claude app, or a Copilot app, PCC’s guarantees no longer apply. Bloomberg’s Mark Gurman reported the Extensions architecture in March 2026 and it represents a different privacy model Apple has not yet publicly explained.

What Apple still has to build

The 1.2T Gemini Apple licensed today is a bridge. Apple is building its own AI server chip, codenamed Baltra, for mass production in H2 2026. Dedicated data center facilities come online in 2027. The implied trajectory is that Apple keeps the Gemini teacher for iOS 27’s Project Campos, then either trains a successor in-house or negotiates a second-generation Google model for Apple Foundation Models v12.

Apple is also the party doing the distillation. Over time, Apple’s supervised student models will accumulate a specialized dataset of Siri interactions the Google teacher cannot see. That dataset is Apple’s, not Google’s. It is the raw material for whatever Apple builds next. The more Siri runs on Apple-distilled students, the more Apple can afford to stop paying for the Google teacher.

The ecosystem context matters. Google’s Gemini 3.1 Pro took the top of the benchmark tables in February 2026 and delivers those results at roughly half the blended cost of Claude Opus 4.6. Apple timed the deal to the moment Google was demonstrably leading. If Anthropic or OpenAI reclaims the top of the benchmark chart, Apple’s bargaining position in the next round of negotiations improves. Gurman reported that Apple already talked to Anthropic, which reportedly demanded several billion dollars annually, and to OpenAI, which Apple rejected partly because OpenAI is building its own consumer hardware with Jony Ive.

The architectural story here is not that Apple lost the frontier-model race. It is that Apple bought the frontier model it needed and arranged to run it inside a box the frontier-model provider cannot look into. That is new. No comparable deal has ever been structured this way, because no other cloud AI provider has a Private Cloud Compute equivalent to offer as the deployment target. The five PCC requirements that looked academic in 2024 now have commercial weight. Apple made them the price of admission, and Google paid.

Apple Is Paying Google $1 Billion a Year to Run a Custom 1.2 Trillion Parameter Gemini on Servers Google Cannot Watch

What Apple actually licensed

The distillation layer

Private Cloud Compute: five enforceable guarantees

What this means for Google

The limits of the privacy story

What Apple still has to build

Share this:

Like this:

More posts

MITRE ATLAS: The ATT&CK Framework for AI Systems

Neural Backdoor Attacks: From BadNets to LLM Trojans

LLM Watermarking: How Models Embed Detection Signals in Their Outputs

Differential Privacy for LLMs: The Training Privacy Guarantee

Discover more from My Written Word