LLMs – My Written Word

Abstract visualization of a long sequence of rotating geometric ring elements, sharp on the left and dissolving into noise on the right, representing positional encodings degrading past training length

Why a 1M-Token Model Only Reasons Over 200K

July 28, 2026

Models advertise 1M-token windows but reason reliably over far less. The positional-encoding reason why, and how to measure your real ceiling.

Abstract visualization of a token grid under a mask, most cells darkened with one narrow illuminated path, representing constrained decoding forcing a single output channel

The Jailbreak Hiding in Your JSON Schema

July 28, 2026

A CCS 2026 paper hides jailbreaks in JSON schemas, hitting 94-99% success against GPT-5 and Gemini. Why prompt filters never see it.

Abstract visualization of a vector database index with several nodes fading into ghostly hollow outlines, representing soft-deleted embeddings that remain physically present

Ghost Vectors: Deleted Embeddings Stay Recoverable

July 28, 2026

Researchers tested three vector databases and found deleted embeddings stay intact on disk, recoverable at rates that break GDPR and HIPAA.

How Model Merging Actually Combines Separate LLMs

July 17, 2026

Some top open-weight models are merged, not trained. The math behind task vectors, TIES, DARE, and why the technique works at all.

How Model Quantization Actually Works: INT8 to INT4

July 16, 2026

A 70B model needs 140GB at full precision, 35GB at INT4. The rounding math, why naive quantization breaks, and how GPTQ and AWQ fix it.

How LLM Tokenization Actually Works: BPE Explained

July 16, 2026

The algorithm behind every LLM tokenizer was built in 1994 to compress files, not language. The mechanism, and why it breaks on math and non-English text.

How Mixture-of-Experts Actually Routes Every Token

July 16, 2026

DeepSeek V4 holds 1.6 trillion parameters and uses a fraction per token. The routing math, why naive versions collapse, and the modern fix.

Same Model, 20-Point Gap: Why Coding Benchmarks Mislead

July 15, 2026

Claude Opus 4.6 scores 58% or 80% on the same benchmark depending only on which harness wraps it. Here is why coding agent scores mislead.

How an Export Law Built for Chips Took Down Fable 5

July 15, 2026

A private Commerce Department letter used a dormant 2018 export authority to shut down Fable 5 and Mythos 5 worldwide. Here is the mechanism.

MCP Goes Stateless on July 28. Its Poisoning Problem Stays

July 7, 2026

MCP’s July 28 spec removes sessions and adds response caching. That solves scaling headaches and quietly widens the window for tool poisoning.

ShareLock Splits Malicious Prompts Across AI Agent Tools

July 7, 2026

Researchers used Shamir’s secret sharing to hide prompt-injection payloads across MCP tools, beating detectors with a 90%+ success rate.

MITRE ATLAS: The ATT&CK Framework for AI Systems

May 25, 2026

MITRE ATLAS provides the shared vocabulary for AI security threat intelligence: 14 tactic categories, techniques like AML.T0018 (Backdoor ML Model) and AML.T0043 (Craft Adversarial Data), and a crosswalk…

Neural Backdoor Attacks: From BadNets to LLM Trojans

May 25, 2026

Gu et al.’s BadNets (2017) installed hidden triggers via training poisoning. By 2023, instruction-following backdoors target RLHF pipelines directly. Rare-word triggers, weight poisoning, and universal adversarial triggers all…

LLM Watermarking: How Models Embed Detection Signals in Their Outputs

May 25, 2026

Kirchenbauer’s green-red token list (ICML 2023), Aaronson’s EMS, and Kuditipudi’s ITS scheme all embed detectable statistical signals into LLM outputs. But Zhang et al. proved no watermark is…

Differential Privacy for LLMs: The Training Privacy Guarantee

May 25, 2026

Differential privacy provides the only formal guarantee against LLM training data leakage. DP-SGD’s four steps, the Moments Accountant, Renyi DP, and the epsilon values that actually mean something…

Multiagent LLM Security: When Your Agent Talks to a Malicious Agent

May 25, 2026

When LLMs call other LLMs as tools, injection attacks jump the boundary. ConVerse (2026) found 88% privacy violations and 60% security breaches in plausible agent-to-agent discourse. Here is…

LLMail-Inject: What 208K Attacks Against an Email Agent Found

May 25, 2026

Microsoft Research’s LLMail-Inject challenge: 839 participants, 208,095 unique attacks against a simulated email agent with production defenses. The finding: adaptive attackers breach even well-designed defense stacks. Here is…

Adversarial Machine Learning: From Szegedy to LLM Attacks

May 25, 2026

Szegedy (2014) showed deep networks could be fooled by imperceptible perturbations. FGSM, PGD, and C&W followed. By 2025, the same mathematical framework governs jailbreaks, poisoning, and memorization extraction.…

How RLHF and Constitutional AI Build Safety Into Language Models

May 25, 2026

RLHF trains models to prefer human-preferred outputs. Constitutional AI uses AI self-critique guided by principles. Neither provides formal guarantees. Here is how both techniques work, what they install…

LLM Training Data Memorization: When Models Leak Their Training Sets

May 25, 2026

LLMs memorize verbatim sequences from training data. Carlini et al. demonstrated extraction of phone numbers, email addresses, and private keys from GPT-2. Here is the mechanism, what gets…

Tag: LLMs