Published October 3, 2024
| Version v1
Dataset
Open
Evolving patterns of co-mutations from tumor initiation to metastatic progression
Authors/Creators
Description
# README
This repository includes two supplementary datasets. Their contents are described below.
## Supplementary Data 1
- **Data version:** GENIE v15
- **Raw data:** https://doi.org/10.7303/syn53210170
This dataset contains 19 folders, each corresponding to an AACR Project GENIE participating center. Each folder includes `.rds` files containing processed GAMs, with one file for each gene sequencing panel used by that center.
Each `.rds` file stores a list containing two GAMs:
- one built from **missense mutations**
- one built from **truncating mutations**
### How to read the data in R
```r
data <- readRDS("MSK/MSK-IMPACT468.rds")
```
## Supplementary Data 2
This dataset contains two main folders:
- `pan_cancer` — for joint analysis across all tumor classes
- `tumor_class` — for tumor class-specific analyses
Each of these folders contains three subfolders corresponding to:
- **MSK**
- **DFCI**
- **TCGA** (https://www.cancer.gov/ccg/research/genome-sequencing/tcga)
Each subfolder contains `.rds` files with processed `run_data` objects for running **SelectSim** (https://github.com/CSOgroup/SelectSim). These objects include:
- missense and truncating mutation GAMs
- tumor mutational burden (TMB) estimates for each sample
- sample classes (tumor subtypes)
Within each of the three subfolders:
- files at the top level contain GAMs built using **all available patients** and **OncoKB genes** (`n = 396`)
- for example: `pan_can_dfci_primary_run_data_v15.rds`
- files within subfolders named `gene_panel` contain GAMs restricted to the corresponding gene panel and to patients sequenced using that panel
Each file name includes labels indicating:
- the cohort (**MSK**, **DFCI**, or **TCGA**)
- the tumor class (or the label **pan-can**)
- the gene sequencing panel (for example, `p_505`)
- the metastatic status of the analyzed patients (`primary` or `meta`)
### How to read the data in R
```r
data <- readRDS("pan_can_tcga_run_data.rds")
```
## Contact Details
For any questions, please contact:
- **Arvind Iyer** — ayalurarvind@gmail.com, arvind.iyer@unil.ch
- **Miljan Petrovic** — miljan.petrovic@unil.ch
**Lead contact:**
- **Giovanni Ciriello** — giovanni.ciriello@unil.ch
## Acknowledgments
The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as the members of the consortium for their commitment to data sharing. Interpretations are the responsibility of the study authors.
Files
Files
(20.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:15bb4b139928af663bf73b8e8559184a
|
3.4 MB | Download |
|
md5:4d68081e47f8bd35454bafc2b5a73697
|
16.7 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/CSOgroup/SelectSim
- Programming language
- R
- Development Status
- Active