Published October 3, 2024 | Version v1
Dataset Open

Evolving patterns of co-mutations from tumor initiation to metastatic progression

  • 1. University of Lausanne
  • 2. ROR icon SIB Swiss Institute of Bioinformatics
  • 3. ROR icon Swiss Cancer Center Léman

Description

# README


This repository includes two supplementary datasets. Their contents are described below.

## Supplementary Data 1


- **Data version:** GENIE v15
- **Raw data:** https://doi.org/10.7303/syn53210170

This dataset contains 19 folders, each corresponding to an AACR Project GENIE participating center. Each folder includes `.rds` files containing processed GAMs, with one file for each gene sequencing panel used by that center.

Each `.rds` file stores a list containing two GAMs:
- one built from **missense mutations**
- one built from **truncating mutations**

### How to read the data in R


```r
data <- readRDS("MSK/MSK-IMPACT468.rds")
```

## Supplementary Data 2


This dataset contains two main folders:

- `pan_cancer` — for joint analysis across all tumor classes
- `tumor_class` — for tumor class-specific analyses

Each of these folders contains three subfolders corresponding to:
- **MSK**
- **DFCI**
- **TCGA** (https://www.cancer.gov/ccg/research/genome-sequencing/tcga)

Each subfolder contains `.rds` files with processed `run_data` objects for running **SelectSim** (https://github.com/CSOgroup/SelectSim). These objects include:

- missense and truncating mutation GAMs
- tumor mutational burden (TMB) estimates for each sample
- sample classes (tumor subtypes)

Within each of the three subfolders:

- files at the top level contain GAMs built using **all available patients** and **OncoKB genes** (`n = 396`)
- for example: `pan_can_dfci_primary_run_data_v15.rds`
- files within subfolders named `gene_panel` contain GAMs restricted to the corresponding gene panel and to patients sequenced using that panel

Each file name includes labels indicating:
- the cohort (**MSK**, **DFCI**, or **TCGA**)
- the tumor class (or the label **pan-can**)
- the gene sequencing panel (for example, `p_505`)
- the metastatic status of the analyzed patients (`primary` or `meta`)

### How to read the data in R


```r
data <- readRDS("pan_can_tcga_run_data.rds")
```

## Contact Details


For any questions, please contact:
- **Arvind Iyer** — ayalurarvind@gmail.com, arvind.iyer@unil.ch
- **Miljan Petrovic** — miljan.petrovic@unil.ch

**Lead contact:**
- **Giovanni Ciriello** — giovanni.ciriello@unil.ch

## Acknowledgments


The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as the members of the consortium for their commitment to data sharing. Interpretations are the responsibility of the study authors.

Files

Files (20.1 MB)

Name Size Download all
md5:15bb4b139928af663bf73b8e8559184a
3.4 MB Download
md5:4d68081e47f8bd35454bafc2b5a73697
16.7 MB Download

Additional details

Software

Repository URL
https://github.com/CSOgroup/SelectSim
Programming language
R
Development Status
Active