pip install esm@git+https://github.com/Biohub/esm.git@mainESMFold2
ESMFold2 is the successor to ESMFold that sets a new state of the art for single-sequence structure prediction and enables the generation of new functional proteins through searching the ESMC model’s latent space. The model predicts high-resolution, all-atom 3D structures of biomolecular complexes directly from sequence, with optional multiple sequence alignment (MSA) input for enhanced accuracy on challenging targets.
Get Started
Quickstart Guide
Install the esm Python package
Create an API key
Connect to the Biohub Platform API
from esm.sdk.forge import SequenceStructureForgeInferenceClient
client = SequenceStructureForgeInferenceClient(model="esmfold2-fast-2026-05", url="https://biohub.ai", token="<your API token>")Run your inference
Model Tutorials
Explore All TutorialsFolding with ESMFold2
Fold proteins in combination with DNA, RNA, and small-molecule ligands.
Binder design
Design antibodies and minibinders with high hit rates. Implements the protocol featured in our paper, which produced binders exhibiting nanomolar affinity, target specificity, and functional activity in laboratory assays.
Model Details
Model Card
Version
2026-05
Architecture
ESM representations that power a series of looped folding layers. A diffusion model projects pairwise representations to atomic resolution predictions.
Supported Modalities
Sequence and structure
Training Data
ESMFold2 was trained on sequences from the Protein Data Bank (PDB) and the AlphaFold DB (AFDB).
Intended Use
ESMFold2 predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input. The outputs include comprehensive structural information including all-atom coordinates (backbone and side chains), confidence metrics, and optional distogram predictions for detailed analysis of predicted structures.
Limitations & Risks
The model predicts single static conformations and is not designed for modeling protein dynamics, conformational flexibility, or multiple conformations of the same protein. Outputs should be validated experimentally. Not intended for clinical or therapeutic applications without further validation.
Explore the Model
ESM Atlas Data
Dataset | Size | CLI Command |
|---|---|---|
SequencesProtein sequences (6.8B proteins) | 2.2 TB | |
StructuresProtein structures (1B proteins) | 68.9 TB | |
SAE featuresPer protein and per-residue feature vectors (6.8B proteins) | 306 TB | |
SAE ClustersCluster-level organization based on SAE features (7.5M clusters) | 26 GB | |
HMM ResultsPredicted pfam and taxonomy (6.8B proteins) | 653 MB | |
Protein_to_accessionMapping of protein IDs to accession numbers (6.8B proteins) | 162 GB | |
NormalizationSAE feature normalization | 192 KB | |
All DataComplete set of sequences, structures, features, and clusters | 377 TB |