AI Models – Page 2 – My Written Word

LLM Excessive Agency: Why Every Tool Your Agent Has Is a Risk

May 18, 2026

Every tool an LLM agent has is an attack surface. OWASP’s LLM06 and the b3 benchmark across 31 models show why: capability scope determines blast radius. Here is…

Julia Bazinska and the Science of Measurable AI Security

May 18, 2026

Julia Bazinska built the empirical tools that make LLM security measurable. From DeepMind RL to first-authoring b3, here is what her research at Lakera actually produced.

Gandalf the Red: What 279K Real Attacks Reveal About LLM Defense

May 18, 2026

Lakera’s ICML 2025 paper ran 279K crowdsourced attacks to show what synthetic red-teaming misses. The D-SEC finding: system prompts degrade user experience without blocking attackers. Here is the…

Vision-Language Models: Architecture and the Benchmark Gap

May 18, 2026

How CLIP, SigLIP, Q-Former, and MLP adapters work in vision-language models. Why Qwen2.5-VL compresses visual tokens 4x, and what current VLMs still cannot do.

Chinchilla Scaling Laws: Three Methods and Why Labs Ignore Them

May 18, 2026

Chinchilla proved GPT-3 was undertrained. The 20:1 rule is a training-compute floor. Three methods, their disagreements, and why frontier labs now exceed it.

Speculative Decoding: How LLMs Generate 3x Faster

May 18, 2026

Speculative decoding achieves 3-4x LLM speedup with zero output quality loss. The math proof, EAGLE-2’s 4.26x result, and when it does not help.

LLMs in Veterinary Clinical Practice: What the Evidence Actually Shows

May 10, 2026

ChatGPT-4.5 scored 90% on feline eye disease cases vs 96.7% for experienced veterinary ophthalmologists and significantly outperformed novices (56-67%). Where LLMs add clinical value in veterinary practice, where…

Generative AI for Small Molecule Drug Discovery: How It Works and What the Evidence Shows

May 10, 2026

Generative AI is producing novel molecules from VAEs, GANs, and diffusion models. Machine learning virtual screening shows 75% hit validation rates against 106M-compound libraries. Why no AI-designed drug…

AI in Digital Pathology: What Computational Pathology Can and Cannot See

May 10, 2026

An NIH multi-institution study in Lancet Oncology classified 52 CNS tumor types from tissue images at 80% accuracy across 5,516 test samples. A Cancer Science paper simultaneously documented…

FDA Clearance for AI Medical Devices: What 510(k), De Novo, and PMA Actually Mean

May 10, 2026

The FDA has cleared 700+ AI medical devices through 510(k), De Novo, and PMA pathways. A March 2026 European Radiology review documents how the EU AI Act, FDA…

AI-Driven ADMET Prediction: What the Blind Challenge Results Actually Show

May 10, 2026

Deep learning beat classical methods for ADME prediction in a 65-team blind challenge at the 2025 OpenADMET competition. An AI-PBPK platform predicted full human pharmacokinetic curves from molecular…

Poisoning the Medical Brain: RAG Attacks and Security in Clinical AI Systems

May 10, 2026

Clinical LLMs failed prompt injection at 94% in JAMA testing. RAG systems face a harder attack: poisoned retrieved documents that the LLM cannot distinguish from legitimate sources. How…

What ASL-3 Actually Means: Anthropic’s Biorisk Threshold Explained

May 10, 2026

ASL-3 is Anthropic’s threshold where models could provide serious uplift on bioweapons with mass casualty potential. What the Virology Capabilities Test actually evaluates, how the 4x novice uplift…

DNA Synthesis Screening Cannot Keep Up With AI-Designed Sequences

May 10, 2026

The IGSC DNA synthesis screening standard was built on sequence homology to known pathogens. AI-designed sequences achieve dangerous functions through novel sequences that homology checks cannot recognize. What…

ESM3: The Protein Language Model That Unifies Sequence, Structure and Function

May 10, 2026

ESM3 from EvolutionaryScale is a 98B-parameter generative protein language model that reasons across sequence, structure, and function simultaneously. How the VQ-VAE structural tokenization works, what the GFP design…

Radiology Foundation Models: What Merlin, the 22% Hallucination Rate, and ED Fracture Data Tell Us

May 10, 2026

Stanford published Merlin in Nature: a CT foundation model tested on 44,098 scans across 3 institutions. Meanwhile 22% of AI radiology reports contain factual errors and LLMs miss…

AI in Radiology: Three Phases and What the Clinical Evidence Shows

May 10, 2026

Radiology AI has moved through three phases: rule-based CAD, the deep learning benchmark era, and clinical deployment validation. A 556-paper bibliometric analysis and a multicenter thymus CT validation…

AlphaFold 3: What It Gets Right and Where It Still Fails

May 10, 2026

AlphaFold 3 improves GPCR backbone prediction over AF2 but shows significant discrepancies in ligand-binding poses for ions, peptides, and protein ligands. PubMed-sourced evidence from the Shanghai Institute of…

Poisoning the Medical Brain: How RAG Attacks Corrupt Biomedical AI

May 10, 2026

When the knowledge base is the attack surface. RAG poisoning allows adversaries to redirect medical AI outputs without touching model weights. Five arXiv papers explain the mechanism and…

AI in Veterinary Medicine: What the Clinical Evidence Actually Shows

May 10, 2026

Veterinary AI is producing measurable results in canine radiology, equine PET imaging, gait analysis, and dairy herd monitoring. Six PubMed-indexed studies from 2024-2026 with specific accuracy numbers, and…

Tag: AI Models