How Protein Language Models Learned to Design Dangerous Proteins

3 models

open-source protein design models used to bypass DNA synthesis screening

ESM3

protein language model that learns sequence-structure-function relationships jointly

RFdiffusion

diffusion-based backbone generator that can design functional analogs with novel sequences

Training data

exclusion proposed as safety control, found ineffective in 2025 study

In 2025, researchers at Johns Hopkins Center for Health Security published a study in Science demonstrating that three publicly available open-source protein design models could generate functional protein sequences with dangerous properties while producing output sequences with low similarity to any protein in current DNA synthesis screening databases. The models used were ESM3, RFdiffusion combined with ProteinMPNN, and a third tool based on Chroma. The study constitutes the first systematic empirical demonstration that the protein AI design pipeline bypasses the primary biosecurity control applied to DNA synthesis.

How Protein Language Models Work

Protein language models are transformer architectures trained on protein sequence data, analogous in design to LLMs trained on text. Instead of predicting the next token in a sentence, they learn to predict masked amino acids in protein sequences. The training signal comes from the statistical regularities in hundreds of millions of known protein sequences: amino acid substitution patterns that preserve structural and functional properties, coevolutionary signals between positions that contact each other in 3D space, and conservation patterns that reflect functional constraints. ESM3 extends this to joint reasoning across sequence, structure, and function simultaneously.

The Biosecurity Gap

DNA synthesis companies screen orders using sequence similarity algorithms comparing ordered sequences to databases of known dangerous proteins from select agent lists. The 2025 Johns Hopkins study showed that ESM3 and RFdiffusion can generate sequences with structural and functional similarity to dangerous proteins but low sequence similarity to any protein in screening databases. A synthesized sequence that passes screening but folds into the same structure as a toxin retains the toxin’s biological activity. The screening gap is structural: sequence homology screening cannot catch functional analogs that achieve the same three-dimensional shape through different amino acid sequences.

Why Training Data Exclusion Fails

A proposed safety measure is to exclude dangerous protein sequences from training data for protein language models. The Johns Hopkins study tested this approach directly and found it ineffective. Models trained with dangerous sequences excluded still generated functional analogs because the dangerous function emerges from structural and biophysical principles that are encoded in the broader training distribution. You cannot excise the physics of protein folding from a dataset by removing particular sequences.

Limitations

The study used proxy measures of functionality rather than direct experimental demonstration of dangerous biological activity, for obvious biosafety reasons. The proteins generated were not synthesized and tested. The study measured structural and sequence characteristics predicted to correlate with dangerous function, not confirmed dangerous function.

Primary sources: Johns Hopkins Center for Health Security, Science 2025 (protein design biosecurity bypass study); ESM3 architecture: Hayes T et al., Science 2024.

How Protein Language Models Learned to Design Dangerous Proteins

How Protein Language Models Work

The Biosecurity Gap

Why Training Data Exclusion Fails

Limitations

Like this:

More posts

LLMs in Veterinary Clinical Practice: What the Evidence Actually Shows

AI-Assisted Zoonotic Disease Detection: From SARS to H5N1

One Health and Machine Learning: How AI Bridges Human and Animal Disease Surveillance

Generative AI for Small Molecule Drug Discovery: How It Works and What the Evidence Shows

How Protein Language Models Learned to Design Dangerous Proteins

How Protein Language Models Work

The Biosecurity Gap

Why Training Data Exclusion Fails

Limitations

Share this:

Like this:

More posts

LLMs in Veterinary Clinical Practice: What the Evidence Actually Shows

AI-Assisted Zoonotic Disease Detection: From SARS to H5N1

One Health and Machine Learning: How AI Bridges Human and Animal Disease Surveillance

Generative AI for Small Molecule Drug Discovery: How It Works and What the Evidence Shows

Discover more from My Written Word