Warning
Alpha API. This is an experimental release. Endpoints, schemas, and behavior may change without notice. Not recommended for production use.
API Reference
Base URL
https://biohub.ai/
Authentication
No client-side authentication is required to call the ESM Atlas API itself in the current alpha release.
Endpoints
- GET /esm/protein/api/v1alpha1/features
Get Features
Return all SAE features (feature_index, label, description).
- Status Codes:
200 OK – Successful Response
- GET /esm/protein/api/v1alpha1/features/{feature_index}
Get detailed metadata for a single SAE feature
Return all available metadata for one SAE feature: longform description, top activating UniRef90 and SwissProt proteins, decoder nearest neighbors, and activation statistics.
- Parameters:
feature_index (integer)
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error
- GET /esm/protein/api/v1alpha1/clusters/{protein_hash}
Get cluster info and member hashes for a representative protein
Look up cluster metadata and member protein hashes for a cluster representative protein identified by its MD5 hash.
- Parameters:
protein_hash (string)
- Query Parameters:
topk_features (integer)
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error
- GET /esm/protein/api/v1alpha1/proteins/{protein_hash}/thumbnail/{thumbnail_type}
Get a protein structure thumbnail PNG
Stream a pre-rendered protein structure thumbnail PNG. Sub-resource of /proteins/{hash}. The PNG is content-addressed by hash, so the response is cached aggressively (immutable, 1 year).
- Parameters:
protein_hash (string)
thumbnail_type (string)
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error
- GET /esm/protein/api/v1alpha1/proteins/{protein_hash}
Get protein metadata, sequence, and top SAE features
Look up a protein by its MD5 hash and return its metadata, amino acid sequence, and top-activating SAE features.
- Parameters:
protein_hash (string)
- Query Parameters:
topk_features (integer)
fold_on_miss (boolean)
normalize_features (boolean) – When true (default), feature activation values are scaled per-feature so they are comparable across proteins (matches the ranking shown in the UI). When false, returns raw SAE activations — useful for callers doing their own normalization.
feature_indices ({'null', 'array'}) – When provided, return values for exactly these feature indices (in the given order) instead of the top-K ranking. Features with no recorded activation are returned with value 0.0; indices not in the catalog are skipped. Capped at 100 entries.
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error
- POST /esm/protein/api/v1alpha1/proteins/batch
Batch protein lookup
Look up multiple proteins by hash. Small batches return a zip file with per-data-type files (200). Large batches return a job handle for async polling (202).
Request body:
{ "protein_hashes":{ "items":{ "type":"string" }, "type":"array", "title":"Protein Hashes", "description":"MD5 hashes of proteins to look up. Max 500 unique entries; duplicates are deduplicated." }, "topk_features":{ "type":"integer", "title":"Topk Features", "description":"Number of top SAE features to include per protein (1\u2013100).", "default":10 }, "include_structure":{ "type":"boolean", "title":"Include Structure", "description":"If true, include PDB structure, pTM, and per-residue pLDDT.", "default":true }, "include_cluster_info":{ "type":"boolean", "title":"Include Cluster Info", "description":"If true, include cluster representative metadata for each protein.", "default":true }, "include_sequence":{ "type":"boolean", "title":"Include Sequence", "description":"If true, include the amino acid sequence.", "default":true }, "include_features":{ "properties":{ "protein_level":{ "type":"boolean", "title":"Protein Level", "description":"Include protein-level SAE feature activations.", "default":true }, "per_residue":{ "type":"boolean", "title":"Per Residue", "description":"Include per-residue SAE feature activations.", "default":true } }, "type":"object", "title":"FeatureOptions", "description":"Sub-options for feature data in batch downloads." } }
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error
- GET /esm/protein/api/v1alpha1/proteins/batch/jobs/{job_id}
Poll batch protein lookup job status
Check the status of an async batch protein lookup. Returns 200 with a zip download URL when complete, 202 when pending, 410 when expired.
- Parameters:
job_id (string)
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error
- DELETE /esm/protein/api/v1alpha1/proteins/batch/jobs/{job_id}
Cancel an async batch protein lookup job
Request cancellation of an async batch protein lookup job. Cancellation is idempotent: returns 204 whether the job is still pending, already complete, or already cancelled. A job that has already produced results keeps them; subsequent GETs on the job will return the original outcome. Returns 404 if no job with this id is known.
- Parameters:
job_id (string)
- Status Codes:
204 No Content – Successful Response
422 Unprocessable Entity – Validation Error
- GET /esm/protein/api/v1alpha1/similarity-search
Search for similar proteins using SAE feature vectors
Submit an amino acid sequence and find similar proteins in the atlas based on SAE feature vector similarity.
- Query Parameters:
sequence (string) – Amino acid sequence (Required)
topk_results (integer) – Number of similar proteins to return
topk_features (integer) – Number of top features to return
min_similarity (number) – Minimum similarity score; results below this threshold are excluded
cluster_pct_characterized_max ({'null', 'integer'}) – If set, only return hits whose cluster_pct_characterized is <= this value. Use 0 to find clusters whose members have no characterized Pfam annotations (proxy for uncharacterized / novel).
include_cluster_info (boolean) – If true, each result includes the representative protein’s cluster_size and human-readable protein_name, sparing the caller a follow-up /clusters/{hash} request per result.
- Status Codes:
200 OK – Successful Response
422 Unprocessable Entity – Validation Error