ESMFold2

ESMFold2 is the successor to ESMFold that sets a new state of the art for single-sequence structure prediction and enables the generation of new functional proteins through searching the ESMC model’s latent space. The model predicts high-resolution, all-atom 3D structures of biomolecular complexes directly from sequence, with optional multiple sequence alignment (MSA) input for enhanced accuracy on challenging targets.

Structure Prediction•Version 2026-04

Fold Proteins API Reference

Paper GitHub Hugging Face

Get Started

Quickstart Guide

API Reference

Install the `esm` Python package

Python

pip install esm@git+https://github.com/Biohub/esm.git@main

Create an API key

Connect to the Biohub Platform API

Python

from esm.sdk.forge import SequenceStructureForgeInferenceClient

client = SequenceStructureForgeInferenceClient(model="esmfold2-fast-2026-05", url="https://biohub.ai", token="<your API token>")

Run your inference

Model Tutorials

Explore All Tutorials

Folding with ESMFold2

Fold proteins in combination with DNA, RNA, and small-molecule ligands.

Open Notebook

Binder design

Design antibodies and minibinders with high hit rates. Implements the protocol featured in our paper, which produced binders exhibiting nanomolar affinity, target specificity, and functional activity in laboratory assays.

Open Notebook

Model Details

Model Card

Open in Hugging Face

Version

2026-05

Architecture

ESM representations that power a series of looped folding layers. A diffusion model projects pairwise representations to atomic resolution predictions.

Supported Modalities

Sequence and structure

Training Data

ESMFold2 was trained on sequences from the Protein Data Bank (PDB) and the AlphaFold DB (AFDB).

Intended Use

ESMFold2 predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input. The outputs include comprehensive structural information including all-atom coordinates (backbone and side chains), confidence metrics, and optional distogram predictions for detailed analysis of predicted structures.

Limitations & Risks

The model predicts single static conformations and is not designed for modeling protein dynamics, conformational flexibility, or multiple conformations of the same protein. Outputs should be validated experimentally. Not intended for clinical or therapeutic applications without further validation.

This model is released under the MIT License.

Explore the Model

ESM Atlas Data

Dataset	Size	CLI Command
SequencesProtein sequences (6.8B proteins)	2.2 TB
StructuresProtein structures (1B proteins)	68.9 TB
SAE featuresPer protein and per-residue feature vectors (6.8B proteins)	306 TB
SAE ClustersCluster-level organization based on SAE features (7.5M clusters)	26 GB
HMM ResultsPredicted pfam and taxonomy (6.8B proteins)	653 MB
Protein_to_accessionMapping of protein IDs to accession numbers (6.8B proteins)	162 GB
NormalizationSAE feature normalization	192 KB
All DataComplete set of sequences, structures, features, and clusters	377 TB