Category: Research
-
Alibaba’s Qwen 3.5 9B scores 81.7 on GPQA Diamond, beating GPT-OSS-120B despite being 13 times smaller. The architecture behind this result is genuinely new. Here is what it means for on-device AI and the closing gap between open-weight and closed commercial models.
-
NVIDIA released Nemotron 3 Super at GTC 2026 with 60.47% on SWE-Bench Verified, the highest open-weight score ever recorded. Here is the architecture, what the benchmark means, and why a GPU vendor giving away frontier models changes the competitive picture.
-
A sandboxed Claude Code agent running on a GPU cluster autonomously discovered adversarial attacks that outperform every known method for jailbreaking LLMs, achieving 100% attack success against Meta-SecAlign-70B. Here is the mechanism, the results, and what it means for AI safety.
-
Google Research published TurboQuant, a quantization algorithm that compresses LLM key-value caches to 3 bits with zero accuracy loss and 6x memory reduction. Here is how the math works, what the benchmarks show, and what developers should know.
-
Epoch AI confirmed GPT-5.4 Pro solved a Ramsey hypergraph problem open since 2019. Four frontier models replicated the result. Meanwhile, the LiteLLM supply chain attack exposed 95 million monthly downloads to credential theft through a compromised security scanner.