Overview

The ESM Atlas is a free, open-access resource for exploring the global protein universe. It provides predicted 3D structures from ESMFold2 and interpretable feature annotations, such as function, from the ESMC language model over a billion proteins, spanning organisms from bacteria and archaea to fungi, viruses, and animals, many of which have never been characterized.

The Atlas is built on top of ESMC, a protein language model trained on billions of protein sequences from across the tree of life. Rather than organizing proteins by sequence similarity or curated database annotations, the Atlas organizes them by learned biological signals through patterns the model detected directly from protein sequence data.

What you can do with the Atlas:

  • Search the Atlas using the agent for any protein or protein function

  • View a protein’s predicted 3D structure and top activated biological features

  • Find structurally and functionally similar proteins, even ones with no obvious sequence similarity to your query

  • Explore clusters of related proteins and understand their biological context

  • Download structures, sequences, and feature data for use in your own analyses

What you can do with the API:

  • Retrieve an Atlas protein: fetch the full stored record for a protein by its MD5 hash — metadata (source, accession), sequence, predicted structure (PDB with per-residue pLDDT and pTM), SAE features and per-residue activations, and a pointer to its cluster representative

  • Predict SAE Features: compute a sequence’s SAE activations using ESMC and the trained SAE — returns protein-level activations, per-residue activations, and top-K enriched features with biological labels

  • Predict 3D structure: predict an ESMFold2 structure for an amino acid sequence (<700aa)

  • Search for similar proteins: find proteins in the Atlas similar to a query sequence using SAE feature embedding similarity

  • Explore SAE features: retrieve descriptions of the 16,384-dimensional feature space that characterizes proteins in the Atlas

  • Inspect a protein cluster: for any protein, retrieve information about the cluster it belongs to — member proteins, top Pfam domains, top SAE features, and taxonomy distribution

  • Batch protein download: submit many proteins as an asynchronous job and download their sequences, SAE features, structures, and cluster membership from S3

See the API Reference for full endpoint details and request/response schemas.