bowang-lab/Orthrus: V1.0 Release
Authors/Creators
Description
Title: v1.0.0: Orthrus – Evolutionary and Functional RNA Foundation Models
Summary We introduce Orthrus, a Mamba-based foundation model for mature RNA. Unlike existing genomic models trained on text-based reconstruction objectives, Orthrus utilizes a novel contrastive learning framework. This objective maximizes embedding similarity between biologically related pairs: splice isoforms and orthologous transcripts derived from the Zoonomia Project. This approach yields representations that cluster by evolutionary and functional similarity rather than sequence identity alone.
Key Features & Architecture
- Biologically Inspired Contrastive Learning: We train on 887 million unique positive pairs, leveraging alternative splicing variations and 400+ mammalian species alignments to identify function-preserving sequence diversity.
- Mamba Backbone: The architecture employs selective state space modeling, enabling linear memory scaling with sequence length and effective filtering of non-informative context.
- Parameter Efficiency: Orthrus (10.1M parameters) outperforms or matches genomic foundation models with billions of parameters (e.g., Evo2, Nucleotide Transformer) on mRNA property prediction tasks.
- Isoform Awareness: The model successfully distinguishes functionally divergent isoforms within the same gene, clustering sequences by functional role (e.g., apoptosis regulation in BCL2L1) rather than just sequence overlap.
Available Models We release weights for three model variants, available via Hugging Face: antichronology/orthrus
- Orthrus Base 6-Track (Recommended): The standard ~10M parameter model. It utilizes a detailed 6-track encoding that incorporates biological context, including splice sites and coding sequence (CDS) markers.
- Orthrus Small 6-Track: A lightweight ~1M parameter version of the 6-track model, optimized for low-resource inference while retaining biological context awareness.
- Orthrus Base 4-Track: The standard ~10M parameter model using a simplified 4-track (one-hot) encoding of the mRNA sequence.
- Orthrus Small 4-Track: A lightweight ~1M parameter version of the 4-track model.
Performance Highlights
- Linear Probing: A simple linear model trained on Orthrus embeddings exceeds supervised Ab initio baselines on key benchmarks, including mRNA half-life and mean ribosome load.
- Few-Shot Learning: Orthrus maintains competitive performance with as few as 30 labeled training examples, significantly outperforming supervised baselines in low-data regimes.
- Functional Prediction: In-silico perturbation analysis (Categorical Jacobian) accurately identifies fitness-promoting exons and functional domains (e.g., WD40 domains in TAF5).
Files
bowang-lab/Orthrus-v1.0.0.zip
Files
(3.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:30d9ff2ceecc61c91f9b6a77a9c45186
|
3.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/bowang-lab/Orthrus/tree/v1.0.0 (URL)
Software
- Repository URL
- https://github.com/bowang-lab/Orthrus