FAQs
My protein isn’t in the Atlas - what happens when I search for it? If your sequence isn’t in the pre-computed dataset, the Atlas will fold it on the fly using ESMFold2 and compute its SAE features in real time. This takes a bit longer than retrieving a pre-computed result but works for any valid protein sequence.
Why does my protein have low feature activation scores across the board? Some proteins, particularly very short sequences, highly disordered proteins, or proteins from poorly represented lineages, may show weaker feature signals. Low activation doesn’t necessarily mean the protein is unimportant; it may simply reflect that the model has less evolutionary context to work with.
How is this different from searching AlphaFold DB or UniProt? AlphaFold DB provides predicted structures but organizes proteins by species and sequence identity, not by learned functional signals. UniProt provides curated functional annotations,but coverage is uneven with many proteins having no annotation at all. The Atlas provides a complementary, model-derived view that surfaces functional relationships between proteins that may share no sequence similarity or have no annotations, based purely on the biological signals ESMC detected in their sequences.
Can I use the Atlas API without a login? Yes. The Atlas API is fully public and requires no account or authentication. API documentation is available within the Atlas web app.
The similarity search returned proteins from a completely different organism than I expected, is that right? Yes, and it’s often the most interesting result. Feature similarity is not constrained by taxonomy. A bacterial protein and a human protein can share strong feature similarity if they’ve evolved to perform the same molecular function. This is especially useful for placing uncharacterized proteins in a biological context when no close relatives exist in model organisms.