Warning

Alpha API. This is an experimental release. Endpoints, schemas, and behavior may change without notice. Not recommended for production use.

API Reference

Base URL

https://biohub.ai/

Authentication

No client-side authentication is required to call the ESM Atlas API itself in the current alpha release.

Endpoints

GET /esm/protein/api/v1alpha1/features

Get Features

Return all SAE features (feature_index, label, description).

Status Codes:
  • 200 OK – Successful Response

GET /esm/protein/api/v1alpha1/features/{feature_index}

Get detailed metadata for a single SAE feature

Return all available metadata for one SAE feature: longform description, top activating UniRef90 and SwissProt proteins, decoder nearest neighbors, and activation statistics.

Parameters:
  • feature_index (integer)

Status Codes:
GET /esm/protein/api/v1alpha1/clusters/{protein_hash}

Get cluster info and member hashes for a representative protein

Look up cluster metadata and member protein hashes for a cluster representative protein identified by its MD5 hash.

Parameters:
  • protein_hash (string)

Query Parameters:
  • topk_features (integer)

Status Codes:
GET /esm/protein/api/v1alpha1/proteins/{protein_hash}/thumbnail/{thumbnail_type}

Get a protein structure thumbnail PNG

Stream a pre-rendered protein structure thumbnail PNG. Sub-resource of /proteins/{hash}. The PNG is content-addressed by hash, so the response is cached aggressively (immutable, 1 year).

Parameters:
  • protein_hash (string)

  • thumbnail_type (string)

Status Codes:
GET /esm/protein/api/v1alpha1/proteins/{protein_hash}

Get protein metadata, sequence, and top SAE features

Look up a protein by its MD5 hash and return its metadata, amino acid sequence, and top-activating SAE features.

Parameters:
  • protein_hash (string)

Query Parameters:
  • topk_features (integer)

  • fold_on_miss (boolean)

  • normalize_features (boolean) – When true (default), feature activation values are scaled per-feature so they are comparable across proteins (matches the ranking shown in the UI). When false, returns raw SAE activations — useful for callers doing their own normalization.

  • feature_indices ({'null', 'array'}) – When provided, return values for exactly these feature indices (in the given order) instead of the top-K ranking. Features with no recorded activation are returned with value 0.0; indices not in the catalog are skipped. Capped at 100 entries.

Status Codes:
POST /esm/protein/api/v1alpha1/proteins/batch

Batch protein lookup

Look up multiple proteins by hash. Small batches return a zip file with per-data-type files (200). Large batches return a job handle for async polling (202).

Request body:

{
  "protein_hashes":{
    "items":{
      "type":"string"
    },
    "type":"array",
    "title":"Protein Hashes",
    "description":"MD5 hashes of proteins to look up. Max 500 unique entries; duplicates are deduplicated."
  },
  "topk_features":{
    "type":"integer",
    "title":"Topk Features",
    "description":"Number of top SAE features to include per protein (1\u2013100).",
    "default":10
  },
  "include_structure":{
    "type":"boolean",
    "title":"Include Structure",
    "description":"If true, include PDB structure, pTM, and per-residue pLDDT.",
    "default":true
  },
  "include_cluster_info":{
    "type":"boolean",
    "title":"Include Cluster Info",
    "description":"If true, include cluster representative metadata for each protein.",
    "default":true
  },
  "include_sequence":{
    "type":"boolean",
    "title":"Include Sequence",
    "description":"If true, include the amino acid sequence.",
    "default":true
  },
  "include_features":{
    "properties":{
      "protein_level":{
        "type":"boolean",
        "title":"Protein Level",
        "description":"Include protein-level SAE feature activations.",
        "default":true
      },
      "per_residue":{
        "type":"boolean",
        "title":"Per Residue",
        "description":"Include per-residue SAE feature activations.",
        "default":true
      }
    },
    "type":"object",
    "title":"FeatureOptions",
    "description":"Sub-options for feature data in batch downloads."
  }
}
Status Codes:
GET /esm/protein/api/v1alpha1/proteins/batch/jobs/{job_id}

Poll batch protein lookup job status

Check the status of an async batch protein lookup. Returns 200 with a zip download URL when complete, 202 when pending, 410 when expired.

Parameters:
  • job_id (string)

Status Codes:
DELETE /esm/protein/api/v1alpha1/proteins/batch/jobs/{job_id}

Cancel an async batch protein lookup job

Request cancellation of an async batch protein lookup job. Cancellation is idempotent: returns 204 whether the job is still pending, already complete, or already cancelled. A job that has already produced results keeps them; subsequent GETs on the job will return the original outcome. Returns 404 if no job with this id is known.

Parameters:
  • job_id (string)

Status Codes:

Search for similar proteins using SAE feature vectors

Submit an amino acid sequence and find similar proteins in the atlas based on SAE feature vector similarity.

Query Parameters:
  • sequence (string) – Amino acid sequence (Required)

  • topk_results (integer) – Number of similar proteins to return

  • topk_features (integer) – Number of top features to return

  • min_similarity (number) – Minimum similarity score; results below this threshold are excluded

  • cluster_pct_characterized_max ({'null', 'integer'}) – If set, only return hits whose cluster_pct_characterized is <= this value. Use 0 to find clusters whose members have no characterized Pfam annotations (proxy for uncharacterized / novel).

  • include_cluster_info (boolean) – If true, each result includes the representative protein’s cluster_size and human-readable protein_name, sparing the caller a follow-up /clusters/{hash} request per result.

Status Codes: