
In March 2025, ARC Institute published Evo 2 in Science, a 7B-parameter genomic foundation model trained on 9.3 trillion base pairs from approximately 128,000 species. Evo 2 processes up to 1 million base pairs in a single context window.
The Architecture
Evo 2 is built on StripedHyena, a hybrid architecture alternating attention layers with Hyena long-range convolution operators. Standard transformer attention has quadratic computational cost with sequence length. Hyena scales sub-quadratically, making million-base-pair contexts tractable.
At 9.3 trillion training tokens on a 40B parameter architecture, Evo 2 represents an extreme case of inference-optimal training: roughly 230 tokens per parameter, an order of magnitude beyond the Chinchilla-optimal 20:1 ratio. The choice reflects the same logic that drives LLaMA and Qwen to train smaller models on more data: inference will run millions of times and training runs once.
Mutation Effect Prediction
Evo 2 achieves state-of-the-art variant effect prediction across coding and non-coding genomic regions. Approximately 90% of disease-associated variants in GWAS studies fall in non-coding regions, where understanding functional impact has been a bottleneck for previous protein-only models.
CRISPR System Generation
Evo 2 can generate complete CRISPR-Cas system designs that are functional in experimental characterization, with sequence diversity from known natural systems.
Biosecurity Considerations
The same capability enabling generation of novel functional genetic elements for therapeutic research applies to potential pathogen enhancement. ARC Institute released Evo 2 open-source, which became a focal point in the debate about whether genomic foundation models at this scale should be openly released.
Related coverage: ESM3: The Protein Language Model That Unifies Sequence, Structure and Function | How Protein Language Models Learned to Design Dangerous Proteins | DNA Synthesis Screening Cannot Keep Up With AI-Designed Sequences
Primary source: Merchant A et al., Science 2025;388:eads9889.