Example: Sequence Similarity Search
Find proteins in the Atlas similar to a query amino acid sequence, ranked by SAE feature embedding similarity.
Endpoint: GET /esm/protein/api/v1alpha1/similarity-search
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
required |
Amino acid sequence (max 800 residues) |
|
int |
10 |
Number of similar proteins to return (max 100) |
|
int |
20 |
Number of common SAE features to summarize across results (max 100) |
|
float |
— |
Drop results below this cosine similarity |
|
int |
— |
Restrict to clusters whose Pfam-characterized fraction is at or below this percentage |
|
bool |
false |
Include cluster size and human-readable cluster name on each result |
curl
curl "https://biohub.ai/esm/protein/api/v1alpha1/similarity-search?sequence=FVNQHLCGSHLVEALYLVCGERGFFYTPKT&topk_results=5&include_cluster_info=true"
Python
import httpx
response = httpx.get(
"https://biohub.ai/esm/protein/api/v1alpha1/similarity-search",
params={
"sequence": "FVNQHLCGSHLVEALYLVCGERGFFYTPKT",
"topk_results": 5,
"include_cluster_info": True,
},
)
data = response.json()
for protein in data["similar_proteins"]:
print(protein["protein_accession"], protein["similarity_score"])
Response
{
"query_sequence": "FVNQHLCGSHLVEALYLVCGERGFFYTPKT",
"protein_hash": "abc123...",
"similar_proteins": [
{
"protein_hash": "def456...",
"protein_accession": "uniprotkb:P01308",
"sequence_length": 110,
"similarity_score": 0.97,
"pdb": "REMARK 0 LICENSE\n...",
"ptm": 0.42,
"mean_plddt": 0.71,
"residues_plddt": [0.62, 0.71, 0.84],
"cluster_size": 12,
"protein_name": "Insulin"
}
],
"top_features_across_results": [
{
"feature_index": 1234,
"occurrence_count": 4,
"min_activation": 0.42,
"max_activation": 0.91,
"mean_activation": 0.67
}
],
"restricted_count": 0
}
protein_hash at the top level is populated only when the query sequence is
already in the Atlas. cluster_size and protein_name are populated only
when include_cluster_info=true. restricted_count reports how many
otherwise-similar proteins were withheld by the biosecurity filter.