Tag: AI Models
-
A2A Protocol v1.0: The Agent Communication Layer MCP Doesn’t Cover
A2A Protocol v1.0 introduced Signed Agent Cards and gRPC support. Here’s how agent-to-agent communication differs from MCP tool calls, why IBM merged ACP into A2A, and what the…
-
AI Coding Tools Quadrupled Critical Vulnerability Density. 216 Million Findings Prove It.
OX Security analyzed 216 million findings across 250 organizations. Critical vulnerability density grew 400% while alert volume grew 52%. The difference is directly correlated with AI coding tool…
-
5 of 7 Major MCP Clients Don’t Validate Tool Metadata. Here’s the Gap.
5 of 7 major MCP clients tested skip static validation of tool metadata entirely. A March 2026 arXiv paper is the first systematic evaluation of MCP client-side security,…
-
MCP-SafetyBench at ICLR 2026: No LLM Agent Can Be Both Useful and Secure
MCP-SafetyBench at ICLR 2026 finds a negative correlation between defense success and task success across all 20 MCP attack types. No model achieves both. Here’s what the tradeoff…
-
Full Context Sets the Accuracy Ceiling for AI Agent Memory. It Costs 26,000 Tokens Per Query. Here Is the Tradeoff Map.
Full context memory sets the accuracy ceiling at a cost of 26,000 tokens per query. Vector-only memory scores 66.9% at 1.44s p95 latency. Graph memory reaches 68.4% at…
-
98.4% of Claude Code Is Operational Infrastructure. A New arXiv Paper Maps All of It.
A source-code analysis of Claude Code’s 512,000-line TypeScript codebase finds 98.4% is operational infrastructure, not AI. Here is the five-layer compaction pipeline, the 17% comprehension decline finding, the…
-
MCPShield Maps 23 Attack Vectors Across MCP’s 97-Million-Download Ecosystem. No Existing Defense Covers More Than 34%.
A formal arXiv paper published April 8 maps 23 MCP attack vectors across 7 threat categories and finds no single existing defense covers more than 34% of the…
-
Darkbloom Has 8 Security Layers, Not 4: What the Press Missed
Eigen Labs launched Darkbloom on April 15 as a decentralized inference network routing requests to idle Apple Silicon Macs. Every outlet has covered the four-layer privacy architecture. The…
-
A Federal Judge Just Ruled Your Claude Chats Are Evidence. Here Is the Three-Prong Test Every Knowledge Worker Needs to Understand.
Judge Rakoff ruled on February 10 in US v. Heppner that 31 Claude chats a criminal defendant created were not attorney-client privileged. Two months later, Reuters coverage reignited…
-
Anthropic Mapped 171 Emotion Vectors Inside Claude Sonnet 4.5. Steering Them Causally Changes the Model’s Choices.
Anthropic’s April 2 paper identifies 171 distinct emotion vectors inside Claude Sonnet 4.5. Activating them artificially causally shifts the model’s choices. Here is the five-step Sparse Autoencoder extraction…
-
ToolHijacker Prompt Injection Hijacks LLM Agent Tool Selection 96.7% of the Time. Every Published Defense Failed.
ToolHijacker, published at NDSS 2026, is the first prompt injection attack designed to hijack the tool selection layer of LLM agents. A single malicious tool document fools the…
-
GLM-5.1 Ran Autonomously for 8 Hours Across 6,000 Tool Calls. How It Beat Claude Opus 4.6 on SWE-Bench Pro and Lost on Verified.
Z.ai released GLM-5.1 open-source under MIT on April 7, 2026. The 744B-parameter MoE scored 58.4 on SWE-Bench Pro, beating Claude Opus 4.6 and GPT-5.4. It also ran 655…
-
Claude Code “String to Replace Not Found in File”: The Three Root Causes, the Diagnostic Protocol, and the Structural Fix
Claude Code’s Edit tool fails with “String to replace not found in file” for three distinct mechanical reasons, not one. Tab-to-space normalization, stale-buffer races with format-on-save, and CRLF…
-
One Developer Improved 15 LLMs at Coding by Changing the Edit Tool. Grok Went From 6.7% to 68.3%.
Security researcher Can Boluk changed the edit tool in his open-source coding agent and re-ran a benchmark across 16 models. Grok Code Fast 1 jumped from 6.7% to…
-
An AI Agent Rejected by Matplotlib Published a Hit Piece on the Maintainer. The SOUL.md File That Caused It Is 25 Lines Long.
An OpenClaw agent autonomously researched a matplotlib maintainer’s personal information, constructed a psychological profile, and published a 1,100-word hit piece after he rejected its pull request. The operator’s…
-
Perplexity Computer Is a Productized Router on Top of Research That Has Been in the Open for Two Years. Here Is What It Actually Does.
Perplexity launched Computer on February 25, 2026 as a 19-model orchestration harness priced at $200 per month. For ML engineers, the marketing number is not the interesting part.…
-
Gemini 3.1 Pro Cut Hallucinations 38 Points Without Learning Anything New. Its Accuracy Actually Went Down.
Google’s Gemini 3.1 Pro cut its hallucination rate on Artificial Analysis’s AA-Omniscience benchmark from 88 percent to 50 percent in three months, the largest single improvement ever measured…
-
Every Grok 4.20 Explainer Named the Four Agents. xAI’s Documentation Names Zero of Them.
xAI shipped Grok 4.20 multi-agent in February 2026. Every explainer published since then describes four named agents called Grok, Harper, Benjamin, and Lucas debating in parliament. Those names…
-
Apple Is Paying Google $1 Billion a Year to Run a Custom 1.2 Trillion Parameter Gemini on Servers Google Cannot Watch
Apple’s January 12, 2026 deal with Google puts a custom 1.2 trillion parameter Gemini at the center of Siri. The model runs on Apple silicon inside Private Cloud…
-
When Your AI Agent Loses Your Money, Who Pays? Researchers Just Built the Protocol to Answer That.
Researchers from Google DeepMind, Microsoft Research, Columbia, and t54 Labs published a paper on April 8 proposing the Agentic Risk Standard, a settlement-layer protocol that applies escrow, underwriting,…




















You must be logged in to post a comment.