How Protein Language Models Learned to Design Dangerous Proteins

How Protein Language Models Learned to Design Dangerous Proteins
How Protein Language Models Learned to Design Dangerous Proteins
3 models
open-source protein design models used to bypass DNA synthesis screening
ESM3
protein language model that learns sequence-structure-function relationships jointly
RFdiffusion
diffusion-based backbone generator that can design functional analogs with novel sequences
Training data
exclusion proposed as safety control, found ineffective in 2025 study

In 2025, researchers at Johns Hopkins Center for Health Security published a study in Science demonstrating that three publicly available open-source protein design models could generate functional protein sequences with dangerous properties while producing output sequences with low similarity to any protein in current DNA synthesis screening databases. The models used were ESM3, RFdiffusion combined with ProteinMPNN, and a third tool based on Chroma. The study constitutes the first systematic empirical demonstration that the protein AI design pipeline bypasses the primary biosecurity control applied to DNA synthesis.

How Protein Language Models Work

Protein language models are transformer architectures trained on protein sequence data, analogous in design to LLMs trained on text. Instead of predicting the next token in a sentence, they learn to predict masked amino acids in protein sequences. The training signal comes from the statistical regularities in hundreds of millions of known protein sequences: amino acid substitution patterns that preserve structural and functional properties, coevolutionary signals between positions that contact each other in 3D space, and conservation patterns that reflect functional constraints. ESM3 extends this to joint reasoning across sequence, structure, and function simultaneously.

The Biosecurity Gap

DNA synthesis companies screen orders using sequence similarity algorithms comparing ordered sequences to databases of known dangerous proteins from select agent lists. The 2025 Johns Hopkins study showed that ESM3 and RFdiffusion can generate sequences with structural and functional similarity to dangerous proteins but low sequence similarity to any protein in screening databases. A synthesized sequence that passes screening but folds into the same structure as a toxin retains the toxin’s biological activity. The screening gap is structural: sequence homology screening cannot catch functional analogs that achieve the same three-dimensional shape through different amino acid sequences.

Why Training Data Exclusion Fails

A proposed safety measure is to exclude dangerous protein sequences from training data for protein language models. The Johns Hopkins study tested this approach directly and found it ineffective. Models trained with dangerous sequences excluded still generated functional analogs because the dangerous function emerges from structural and biophysical principles that are encoded in the broader training distribution. You cannot excise the physics of protein folding from a dataset by removing particular sequences.

Limitations

The study used proxy measures of functionality rather than direct experimental demonstration of dangerous biological activity, for obvious biosafety reasons. The proteins generated were not synthesized and tested. The study measured structural and sequence characteristics predicted to correlate with dangerous function, not confirmed dangerous function.

Related coverage: LLMs Give Novice Biologists 4x Uplift on Dangerous Tasks | DNA Synthesis Screening Cannot Keep Up With AI-Designed Sequences | What ASL-3 Actually Means: Anthropic’s Biorisk Threshold Explained

Primary sources: Johns Hopkins Center for Health Security, Science 2025 (protein design biosecurity bypass study); ESM3 architecture: Hayes T et al., Science 2024.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading