Evaluating transformer-based models for structural characterization of orphan proteins
Authors/Creators
Description
This repository contains all data and scripts generated for the study evaluating transformer-based models for the structural characterization of orphan proteins. It provides the complete computational framework required to reproduce the analyses presented in the manuscript.
The repository is organized into two main directories: data/ and scripts/.
The data/ directory includes raw outputs and processed results from structure prediction tools (AlphaFold2, ESMFold, OmegaFold), intrinsic disorder predictors (AIUPred, flDPnn, LoRa-DR), secondary structure analyses (DSSP, ProtT5), structural similarity searches (Foldseek), relative solvent accessibility (RSA) calculations, and integrated feature tables used for downstream analyses.
The scripts/ directory contains all code used both to run external tools and to process their outputs. The run/ subdirectory documents execution pipelines for structure prediction, disorder prediction, secondary structure analysis, RSA calculation, structural comparison, and Foldseek searches. The treat/ subdirectory contains scripts for statistical analyses, correlation analyses, and figure generation.
All analyses described in the manuscript can be reproduced using the provided data and treatment scripts.
Files
orphan-analysis.zip
Files
(5.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c21e6c68dc52621d3876ba0a8febc2e5
|
5.0 GB | Preview Download |
Additional details
Dates
- Created
-
2026-02-26