Overview
The ESM Atlas is a free, open-access resource for exploring the global protein universe. It provides predicted 3D structures from ESMFold2 and interpretable feature annotations, such as function, from the ESMC language model over a billion proteins, spanning organisms from bacteria and archaea to fungi, viruses, and animals, many of which have never been characterized.
The Atlas is built on top of ESMC, a protein language model trained on billions of protein sequences from across the tree of life. Rather than organizing proteins by sequence similarity or curated database annotations, the Atlas organizes them by learned biological signals through patterns the model detected directly from protein sequence data.
What you can do with the Atlas:
Search the Atlas using the agent for any protein or protein function
View a protein’s predicted 3D structure and top activated biological features
Find structurally and functionally similar proteins, even ones with no obvious sequence similarity to your query
Explore clusters of related proteins and understand their biological context
Download structures, sequences, and feature data for use in your own analyses
What you can do with the API:
Retrieve an Atlas protein: fetch the full stored record for a protein by its MD5 hash — metadata (source, accession), sequence, predicted structure (PDB with per-residue pLDDT and pTM), SAE features and per-residue activations, and a pointer to its cluster representative
Predict SAE Features: compute a sequence’s SAE activations using ESMC and the trained SAE — returns protein-level activations, per-residue activations, and top-K enriched features with biological labels
Predict 3D structure: predict an ESMFold2 structure for an amino acid sequence (<700aa)
Search for similar proteins: find proteins in the Atlas similar to a query sequence using SAE feature embedding similarity
Explore SAE features: retrieve descriptions of the 16,384-dimensional feature space that characterizes proteins in the Atlas
Inspect a protein cluster: for any protein, retrieve information about the cluster it belongs to — member proteins, top Pfam domains, top SAE features, and taxonomy distribution
Batch protein download: submit many proteins as an asynchronous job and download their sequences, SAE features, structures, and cluster membership from S3
See the API Reference for full endpoint details and request/response schemas.