Paradoxical G-quadruplex distribution in coronavirus genomes reveals functional constraints and antiviral therapeutic opportunities
Description
# G-quadruplex Distribution in Coronavirus Genomes: Analysis Code and Data
## Overview
This dataset contains all code, data, and supplementary materials for the manuscript "Paradoxical G-quadruplex distribution in coronavirus genomes reveals
functional constraints and antiviral therapeutic opportunities" published in *Virus Research* (2025).
## Key Findings
- **Genome-wide G4 depletion**: Fold-change = 0.56 (95% CI: 0.24-2.30)
- **Regional enrichment in critical proteins**:
- Spike protein: IRR = 17.9 (95% CI: 11.7-27.6)
- Nucleocapsid protein: IRR = 14.4 (95% CI: 8.3-25.1)
- **Therapeutic targets**: 38 thermodynamically stable G4 candidates (ΔG < -5 kcal/mol)
- **Primary target**: GGCTGGCAATGGCGG (ΔG = -7.35 kcal/mol, 54.8% conservation)
## Dataset Contents
### Genomes (31 files)
- **SARS-CoV-2 variants** (n=19): Alpha, Beta, Gamma, Delta, Omicron sublineages (BA.1, BA.2, BA.5, BQ.1.1, XBB.1.5, XBB.1.16, JN.1), Epsilon, Eta, Kappa,
Lambda, Mu, D614G, B.1, Wuhan reference
- **Other coronaviruses** (n=12): SARS-CoV, MERS-CoV, bat coronaviruses (RmYN02, RaTG13, BANAL-52), pangolin coronaviruses, HCoV-OC43, HCoV-HKU1, HCoV-229E,
HCoV-NL63
### Code (95 Python scripts)
- G4 detection and regional mapping
- Statistical robustness analysis (Bootstrap, Poisson GLM, GEE models)
- Thermodynamic stability assessment (ΔG calculations)
- Bayesian network analysis (pgmpy)
- Machine learning predictions (XGBoost)
- Cross-virus comparative analysis
### Results
- Supplementary Tables S1-S7, S6A
- Complete analysis outputs (CSV format)
- Reproducible analysis pipeline
## Usage
```bash
# Install dependencies
pip install -r requirements.txt
# Run complete analysis
./run_all.sh
# Or use Make
make all