Heterogeneity in the gene regulatory landscape of leiomyosarcoma

Tatiana Belova; Nicola Biondi; Ping-Han Hsieh; Pavlo Lutsik; Priya Chudasama; Marieke L. Kuijjer

doi:10.5281/zenodo.8105729

Published May 10, 2023 | Version v2

Journal article Open

Heterogeneity in the gene regulatory landscape of leiomyosarcoma

We reconstructed gene regulatory networks for 80 TCGA and 37 DKFZ leiomyosarcoma samples, and used these networks as input to PORCUPINE (Principal Components Analysis to Obtain Regulatory Contributions Using Pathway-based Interpretation of Network Estimates) method to identify pathways driving leiomyosarcoma heterogeneity. In short, PORCUPINE combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population.

This repository contains following:

“rse_gene.RData” - A RangedSummarizedExperiment-class object for the TCGA RNA-seq data.
“LMS_37_readCount.txt”- Raw expression count data for 37 DKFZ-LMS samples. This file contains a 57,820 by 38 dataframe, where the first column is gene ID.
“GN_ensemblID_symbol.txt”- A 55,476 by 2 dataframe where the first column is Ensemble ID and the second column is gene symbol, corresponding to features in “LMS_37_readCount.txt”.
“preprocessing_and_normalization.R” - R script with preprocessing and normalization workflow of the data.
TCGA_sarc_clinical_info.txt – clinical data for TCGA sarcoma samples.
pcp_tcga_res.txt – This file contains a 1,454 by 6 dataframe with PORCUPINE results for 80 TCGA-LMS tumors.
pcp_dkfz_res.txt – This file contains a 1,454 by 6 dataframe with PORCUPINE results for 37 DKFZ-LMS tumors.
combine_networks.R - A script to combine the LIONESS networks in a single file and calculate gene targeting scores.
indegree_scores_206_sarcomas.txt - Gene targeting scores for 206 TCGA sarcoma samples.
umap_sarcoma.R script to perform UMAP clustering of 206 TCGA sarcoma samples.
pcp_pathways_bubble_plot.R script to reproduce Figure 4
“80_tcga_lms_net.RData” - patient-specific gene regulatory networks for 80 TCGA leiomyosarcoma samples. This file contains a 11,151,077 by 80 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file.
“37_dkfz_lms_net.RData” - patient-specific gene regulatory networks for 37 DKFZ leiomyosarcoma samples. This file contains a 11,151,077 by 37 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file.
“edges.RData” - regulatory edge information, includes a 11,151,077 by 3 dataframe with three columns: reg (the transcription factor's gene IDs), tar (the target gene IDs), prior (information from “prior” network).
"PORCUPINE.zip" - PORCUPINE R package

REACTOME directory contains the following:

list_of_pathways.txt – This file maps the REACTOME identifier to a pathway name and corresponding species.
pathways_hierarchy.txt – Pathways hierarchy relationship, consists of two columns of REACTOME identifier, defining the relationship between pathways within the pathway hierarchy. The first column provides the parent pathway stable id, and the second column provides the child pathway stable identifier.
reactome_pathways_hsa_id.txt - This file maps the REACTOME stable identifier to a REACTOME pathway name used in .gmt file.

The input data used for modeling regulatory networks with PANDA and LIONESS is available in “/networks/input/”

“prior.txt” - Prior information on potential regulatory interactions, obtained from scanning known TF motifs to promoter regions in the human genome. This prior network was previously published in Lopes-Ramos et al. 2021 Cancer Research (PMID 34493595). This file contains an 11,151,077 by 3 dataframe, where the first column is the transcription factor's gene IDs, the second column is the target gene IDs and the third column shows the presence (1) or absence (0) of a motif of a TF in a promoter region of a gene.
“exp_r.txt” - Gene expression data, contains a 17,899 by 11,322 dataframe including normalized expression data for each sample. The first column is a gene ID. The order of columns corresponds to the first column in samples.txt file.
“samples.txt” – Samples, corresponding to the columns of the exp_r.txt. The first 243 samples are sarcoma samples.
“ppi.txt” – protein-protein interactions between TFs obtained from StringDb (https://string-db.org/), as in Lopes-Ramos et al. 2021 (PMID 34493595). The file contains a 80,037 by 3 dataframe with three columns, where the first two columns contain protein IDs and the third column contains a score for each interaction.
Instructions how to model PANDA and LIONESS networks are provided in README.txt file in “/networks/input/” directory. Additionally, the output of PANDA is provided in “/networks/mat/”.

Notes

This work was supported by the Norwegian Research Council, Helse Sør-Øst, and University of Oslo through the Centre for Molecular Medicine Norway (187615), the Norwegian Research Council (313932), Familien Blix Fond, as well as the Emmy Noether Programme Grant from the German Research Foundation (DFG, No. CH 2302/1-1).

Files

GN_ensemblID_symbol.txt

Files (7.6 GB)

Name	Size	Download all
37_dkfz_lms_net.RData md5:9c5461efe01f3b91dd9807b7c5bfe458	845.5 MB	Download
80_tcga_lms_net.RData md5:6e7d8fb914134f606ccef7cb23d4d10e	1.8 GB	Download
combine_networks.R md5:079432ef5b4562886a2c00038b6579fb	2.4 kB	Download
edges.RData md5:d674301b8ca599e459eb307fa0d594ca	3.4 MB	Download
GN_ensemblID_symbol.txt md5:2942fe1191386790bb8f0aa55f909c27	1.6 MB	Preview Download
indegree_scores_206_sarcomas.txt md5:b4262ebdd6e94b55a3a1e5c7bb70dcbc	45.7 MB	Preview Download
LMS_37_readCount.txt md5:60e18eb7a7d1e595aa99af397e9e4717	7.0 MB	Preview Download
networks.zip md5:579e14f26dca6bc92106da1176e8d1d5	3.2 GB	Preview Download
pcp_dkfz_res.txt md5:b1cea338e76df320c6c30dc648883807	141.0 kB	Preview Download
pcp_pathways_bubble_plot.R md5:ce06f5c227336c4790da25cc844088b6	5.2 kB	Download
pcp_tcga_res.txt md5:d680a944fad878b284031917e27b1111	141.0 kB	Preview Download
PORCUPINE.zip md5:9e6d525943e9ec0c67744f347b7a444b	290.2 kB	Preview Download
preprocessing_normalization.R md5:6719c789cea83fbd1e6a176cbd78a364	7.3 kB	Download
REACTOME.zip md5:0e2b61c0512d12b0287793e619f5b8f1	484.1 kB	Preview Download
rse_gene.Rdata md5:4419fc3dab97b6ad91e5a38d83e9648a	1.6 GB	Download
TCGA_sarc_clinical_info.txt md5:929facb2838bb19d6956e1b10d0e978b	76.2 kB	Preview Download
umap_sarcoma.R md5:be811e1c5b583a0c4963758eda2fe12f	2.5 kB	Download

	All versions	This version
Views	176	126
Downloads	1,172	806
Data volume	788.0 GB	450.1 GB

Heterogeneity in the gene regulatory landscape of leiomyosarcoma

Authors/Creators

Description

Notes

Files

GN_ensemblID_symbol.txt

Files (7.6 GB)