ESMC

ESMC is the latest in the ESM family of protein language models, establishing a new frontier in representation learning for protein biology. Trained on billions of evolutionary sequences, it learns representations that reflect a mechanistic reduction of protein structure and function.

Protein Language Model•Version 2026-04

Explore Representations API Reference

Paper GitHub Hugging Face

Get Started

Quickstart Guide

API Reference

Install the `esm` Python package

Python

pip install esm@git+https://github.com/Biohub/esm.git@main

Create an API key

Connect to the Biohub Platform API

Python

from esm.sdk.forge import ESMCForgeInferenceClient

client = ESMCForgeInferenceClient(model="esmc-6b-2024-12", url="https://biohub.ai", token="<your API token>")

Run your inference

Model Tutorials

Explore All Tutorials

Embedding sequences with ESMC

Embed protein sequences and explore how different transformer layers encode structural and functional information.

Open Notebook

Zero-shot entropy and mutation analysis

Compute per-position entropy and log-likelihood ratios to identify constrained vs. mutation-tolerant sites.

Open Notebook

Layer sweep for enzyme function classification

Learn how to sweep all layers to find which one is best using enzyme classification as a task.

Open Notebook

Understanding proteins with SAE features

Extract and visualize sparse autoencoder features, rank by peak activation, and map activations onto 3D structures.

Open Notebook

Model Details

Model Card

Open in Hugging Face

Version

2026-04

Architecture

Transformer

Supported Modalities

Sequence

Training Data

Up to 6 billion proteins

Intended Use

ESMC is designed for protein science research including structure prediction, function annotation, protein design, and understanding evolutionary relationships between proteins. It can generate novel proteins given partial sequence, structure, or functional constraints.

Limitations & Risks

Outputs should be validated experimentally. The model may generate proteins that are not synthesizable or functional. Not intended for clinical or therapeutic applications without further validation.

This model is released under the MIT License.

Explore the Model

ESM Atlas Data

Dataset	Size	CLI Command
SequencesProtein sequences (6.8B proteins)	2.2 TB
StructuresProtein structures (1B proteins)	68.9 TB
SAE featuresPer protein and per-residue feature vectors (6.8B proteins)	306 TB
SAE ClustersCluster-level organization based on SAE features (7.5M clusters)	26 GB
HMM ResultsPredicted pfam and taxonomy (6.8B proteins)	653 MB
Protein_to_accessionMapping of protein IDs to accession numbers (6.8B proteins)	162 GB
NormalizationSAE feature normalization	192 KB
All DataComplete set of sequences, structures, features, and clusters	377 TB

ESMC

Get Started

Quickstart Guide

Install the esm Python package

Create an API key

Connect to the Biohub Platform API

Run your inference

Model Tutorials

Embedding sequences with ESMC

Zero-shot entropy and mutation analysis

Layer sweep for enzyme function classification

Understanding proteins with SAE features

Model Details

Model Card

Intended Use

Limitations & Risks

Explore the Model

ESM Atlas Data

Install the `esm` Python package