Google DeepMind – My Written Word

30 Days After QJL: What’s Actually Compressing the KV Cache

May 2, 2026

After QJL failed, three approaches own the KV cache frontier: TriAttention’s pre-RoPE selection, LRKV architectural compression, and adaptive bit-width.

Google Cloud Next 2026: The Agent Infrastructure Stack Explained

April 26, 2026

Google Cloud Next 2026 announced N4A Axion CPU instances for agent orchestration, GKE Agent Sandbox with gVisor isolation, and native A2A support in ADK. Here’s what each layer…

A2A Protocol v1.0: The Agent Communication Layer MCP Doesn’t Cover

April 26, 2026

A2A Protocol v1.0 introduced Signed Agent Cards and gRPC support. Here’s how agent-to-agent communication differs from MCP tool calls, why IBM merged ACP into A2A, and what the…

Gemini 3.1 Pro Cut Hallucinations 38 Points Without Learning Anything New. Its Accuracy Actually Went Down.

April 9, 2026

Google’s Gemini 3.1 Pro cut its hallucination rate on Artificial Analysis’s AA-Omniscience benchmark from 88 percent to 50 percent in three months, the largest single improvement ever measured…

Apple Is Paying Google Billion a Year to Run a Custom 1.2 Trillion Parameter Gemini on Servers Google Cannot Watch

Apple Is Paying Google $1 Billion a Year to Run a Custom 1.2 Trillion Parameter Gemini on Servers Google Cannot Watch

April 9, 2026

Apple’s January 12, 2026 deal with Google puts a custom 1.2 trillion parameter Gemini at the center of Siri. The model runs on Apple silicon inside Private Cloud…

When Your AI Agent Loses Your Money, Who Pays? Researchers Just Built the Protocol to Answer That.

April 9, 2026

Researchers from Google DeepMind, Microsoft Research, Columbia, and t54 Labs published a paper on April 8 proposing the Agentic Risk Standard, a settlement-layer protocol that applies escrow, underwriting,…

Google Published a KV Cache Compression Breakthrough. Six Teams Found Its Key Innovation Doesn’t Work.

April 5, 2026

Google Research published TurboQuant at ICLR 2026, claiming 6x KV cache compression with zero accuracy loss. Memory chip stocks dropped. Then six independent teams implemented it and discovered…

Google Gemma 4 Scores 89% on AIME With 31 Billion Parameters. Here Is How the Architecture Works.

April 3, 2026

Google DeepMind released Gemma 4 in four sizes under Apache 2.0, its first truly permissive open license. The 31B dense model ranks third globally among open models. The…

GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro: The Architecture Differences That Actually Decide Which Model Wins

March 30, 2026

March 2026 is the first month where three frontier AI models are genuinely competitive across every category. GPT-5.4 beats human experts on desktop tasks. Claude Opus 4.6 dominates…

iOS 27 Will Let Siri Route Your Queries to Gemini, Claude, or Any Installed AI. OpenAI’s Exclusive Is Over.

March 27, 2026

Bloomberg’s Mark Gurman reported on March 26, 2026, that Apple is building a Siri Extensions system for iOS 27 that will let installed AI apps (Gemini, Claude, Perplexity,…

Gemini 3.1 Flash Live: Google Collapsed the Voice AI Wait-Time Stack Into a Single Native Audio Process

March 27, 2026

Google launched Gemini 3.1 Flash Live on March 26, 2026 via the Gemini Live API. The core architecture change: traditional voice AI pipelines ran VAD, then STT, then…

Google Lyria 3 Pro: Full Songs, Not Clips. Here Is What Changed in the Architecture.

March 27, 2026

Google launched Lyria 3 Pro on March 25, 2026, one month after Lyria 3. The key advancement is structural composition awareness: users can now specify intros, verses, choruses,…

Gemini Now Imports Your ChatGPT and Claude History. The AI Portability Race Is Officially On.

March 27, 2026

Google launched two Gemini import tools on March 26, 2026: one transfers memory (preferences, relationships, context) via a copy-paste prompt, the other ingests full chat history via ZIP…

Five Companies Control AI. The Government Just Said That’s Fine.

March 27, 2026

Five companies control the AI infrastructure stack: OpenAI, Google DeepMind, Anthropic, Meta, and Microsoft build the models. NVIDIA builds the hardware. Three hyperscalers provide the compute. The White…

AI Overviews Appear on 30% of Searches. Everyone Acts Like It’s 100%.

March 27, 2026

Google AI Overviews appear on 13% of queries globally, with 32.76% category-level presence in some verticals. Organic CTR drops 62% when an AI Overview is shown. But 76.1%…

How Google TurboQuant Compresses LLM Memory by 6x With Zero Accuracy Loss

March 26, 2026

Google Research published TurboQuant on March 25, 2026: a KV cache compression algorithm that reduces LLM inference memory by 6x at 3-bit precision with zero accuracy loss and…

Tag: Google DeepMind