Machine Learning – Page 2 – My Written Word

LLMs Give Novice Biologists 4x Uplift on Dangerous Tasks

May 10, 2026

A 2026 study measured LLM access giving novice biologists a 4.16x accuracy boost on biosecurity-relevant tasks, including beating expert baselines. Here is the mechanism and what it means…

MiniMax M2.7 Optimized Its Own Training Harness 100 Times. Here Is the Loop.

May 5, 2026

MiniMax M2.7 ran an internal agent that modified its own training scaffold 100 times in a row without human input and gained 30% on internal evaluations. Here is…

M-Trends 2026: Exploits Now Arrive Before Patches. The Mean Time-to-Exploit Is Negative 7 Days.

May 5, 2026

Mandiant M-Trends 2026 documents a mean time-to-exploit of negative 7 days. 28.3% of CVEs are being exploited within 24 hours of disclosure. Here is the AI attack chain…

KellyBench: 8 AI Models Bet the Premier League. All Lost Money.

May 5, 2026

General Reasoning put 8 frontier AI models through a full Premier League season with a 100k bankroll each. Every model lost money. The benchmark reveals three distinct failure…

DeepSeek V4’s Hybrid Attention Cuts KV Cache by 10x. Here’s the Architecture.

May 2, 2026

DeepSeek V4-Pro processes one million tokens using 10% of the KV cache V3.2 needed. The mechanism is Hybrid Attention: two complementary compressors interleaved across 61 layers. Here’s how…

Open-Weight LLM Rankings, April 2026: MMLU Is Saturated, Here’s What to Use Instead

April 26, 2026

MMLU is saturated. In April 2026, the metrics that matter are SWE-bench Verified, GPQA Diamond, and RULER’s effective context window. Chinese labs hold 4 of the top 5…

ARC-AGI-3 Is Live. Here’s Why Current Models Score in the Low Double Digits.

April 26, 2026

ARC-AGI-3 launched on Kaggle with a $1M prize and current leaders in low double digits. The benchmark adds Exploration, Modeling, and Planning that test-time compute scaling cannot solve.…

ICLR 2026 Outstanding Papers: What They Actually Found, and the Review Crisis Around Them

April 26, 2026

ICLR 2026 named two outstanding papers: LLMs Get Lost In Multi-Turn Conversation and Transformers are Inherently Succinct. The conference also documented a 45% identity leak and 21% AI-generated…

Agent Memory Architecture: Four Patterns, Four Tradeoffs

April 26, 2026

Agent memory is not one thing. It is four distinct patterns: full context window, hierarchical summarization, external vector store, and episodic log. Each has different performance, cost, failure…

Why 86% of Enterprise AI Agent Pilots Never Reach Production

April 26, 2026

Multiple independent studies in 2026 put the enterprise AI agent pilot failure rate at 86-89%. Six failure modes account for the losses. Here’s what they are, what causes…

Amazon Bedrock AgentCore: What Each Layer Does and Why It Matters

April 26, 2026

Amazon Bedrock AgentCore is six infrastructure services in one name. Here’s what each layer does: Runtime for serverless execution, Memory’s four tiers, Tool Execution’s sandboxing, Action Gateway’s enterprise…

Google Cloud Next 2026: The Agent Infrastructure Stack Explained

April 26, 2026

Google Cloud Next 2026 announced N4A Axion CPU instances for agent orchestration, GKE Agent Sandbox with gVisor isolation, and native A2A support in ADK. Here’s what each layer…

Half of Organizations Have No Visibility Into AI Agent Traffic

April 26, 2026

Salt Security’s H1 2026 report: 48.9% of organizations have zero visibility into AI agent traffic. WAFs were built for humans. Here’s why that gap exists structurally, what the…

SmolVM: Firecracker-Backed MicroVM Sandbox for AI Agent Code Execution

April 26, 2026

SmolVM gives AI agents a hardware-isolated disposable VM using Firecracker. Here’s why Docker containers are the wrong sandbox for LLM-generated code, how the snapshot-fork pattern works, and how…

AI Coding Tools Quadrupled Critical Vulnerability Density. 216 Million Findings Prove It.

April 24, 2026

OX Security analyzed 216 million findings across 250 organizations. Critical vulnerability density grew 400% while alert volume grew 52%. The difference is directly correlated with AI coding tool…

LMDeploy CVE-2026-33626: SSRF Weaponized in 13 Hours

April 24, 2026

LMDeploy SSRF bug CVE-2026-33626 was exploited 13 hours post-disclosure. Full attack chain, AWS credential blast radius, and why AI inference servers are unusually dangerous SSRF targets.

Full Context Sets the Accuracy Ceiling for AI Agent Memory. It Costs 26,000 Tokens Per Query. Here Is the Tradeoff Map.

April 21, 2026

Full context memory sets the accuracy ceiling at a cost of 26,000 tokens per query. Vector-only memory scores 66.9% at 1.44s p95 latency. Graph memory reaches 68.4% at…

When Your AI Agent Loses Your Money, Who Pays? Researchers Just Built the Protocol to Answer That.

April 9, 2026

Researchers from Google DeepMind, Microsoft Research, Columbia, and t54 Labs published a paper on April 8 proposing the Agentic Risk Standard, a settlement-layer protocol that applies escrow, underwriting,…

Meta Rebuilt Its AI Stack From Scratch and Closed the Open-Source Gates. Muse Spark Is What Came Out.

April 9, 2026

Meta shipped Muse Spark on April 8, the first model from Meta Superintelligence Labs nine months after Mark Zuckerberg restructured his entire AI team. It is the first…

OpenAI Killed Sora, Lost Disney’s Billion Dollars, and Proved That Code Beats Video.

April 5, 2026

OpenAI killed Sora five days after shipping new features. Disney found out an hour before the public. The product was losing $1 million per day with fewer than…

Tag: Machine Learning