Published February 26, 2026 | Version v1
Software Open

Evaluating transformer-based models for structural characterization of orphan proteins

  • 1. Inria d'Université Côte d'Azur
  • 2. ROR icon Institut Sophia Agrobiotech
  • 3. Inria Centre de Recherche Sophia Antipolis Méditerranée
  • 4. ROR icon Sorbonne Université

Description

This repository contains all data and scripts generated for the study evaluating transformer-based models for the structural characterization of orphan proteins. It provides the complete computational framework required to reproduce the analyses presented in the manuscript.

The repository is organized into two main directories: data/ and scripts/.

The data/ directory includes raw outputs and processed results from structure prediction tools (AlphaFold2, ESMFold, OmegaFold), intrinsic disorder predictors (AIUPred, flDPnn, LoRa-DR), secondary structure analyses (DSSP, ProtT5), structural similarity searches (Foldseek), relative solvent accessibility (RSA) calculations, and integrated feature tables used for downstream analyses.

The scripts/ directory contains all code used both to run external tools and to process their outputs. The run/ subdirectory documents execution pipelines for structure prediction, disorder prediction, secondary structure analysis, RSA calculation, structural comparison, and Foldseek searches. The treat/ subdirectory contains scripts for statistical analyses, correlation analyses, and figure generation.

All analyses described in the manuscript can be reproduced using the provided data and treatment scripts.

Files

orphan-analysis.zip

Files (5.0 GB)

Name Size Download all
md5:c21e6c68dc52621d3876ba0a8febc2e5
5.0 GB Preview Download

Additional details

Dates

Created
2026-02-26

Software

Programming language
Python , Shell