Heterogeneity in the gene regulatory landscape of leiomyosarcoma
Authors/Creators
Description
We reconstructed gene regulatory networks for 80 TCGA and 37 DKFZ leiomyosarcoma samples, and used these networks as input to PORCUPINE (Principal Components Analysis to Obtain Regulatory Contributions Using Pathway-based Interpretation of Network Estimates) method to identify pathways driving leiomyosarcoma heterogeneity. In short, PORCUPINE combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population.
This repository contains following:
-
“rse_gene.RData” - A RangedSummarizedExperiment-class object for the TCGA RNA-seq data.
-
“LMS_37_readCount.txt”- Raw expression count data for 37 DKFZ-LMS samples. This file contains a 57,820 by 38 dataframe, where the first column is gene ID.
-
“GN_ensemblID_symbol.txt”- A 55,476 by 2 dataframe where the first column is Ensemble ID and the second column is gene symbol, corresponding to features in “LMS_37_readCount.txt”.
-
“preprocessing_and_normalization.R” - R script with preprocessing and normalization workflow of the data.
-
TCGA_sarc_clinical_info.txt – clinical data for TCGA sarcoma samples.
-
pcp_tcga_res.txt – This file contains a 1,454 by 6 dataframe with PORCUPINE results for 80 TCGA-LMS tumors.
-
pcp_dkfz_res.txt – This file contains a 1,454 by 6 dataframe with PORCUPINE results for 37 DKFZ-LMS tumors.
-
combine_networks.R - A script to combine the LIONESS networks in a single file and calculate gene targeting scores.
-
indegree_scores_206_sarcomas.txt - Gene targeting scores for 206 TCGA sarcoma samples.
-
umap_sarcoma.R script to perform UMAP clustering of 206 TCGA sarcoma samples.
-
pcp_pathways_bubble_plot.R script to reproduce Figure 4
-
“80_tcga_lms_net.RData” - patient-specific gene regulatory networks for 80 TCGA leiomyosarcoma samples. This file contains a 11,151,077 by 80 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file.
-
“37_dkfz_lms_net.RData” - patient-specific gene regulatory networks for 37 DKFZ leiomyosarcoma samples. This file contains a 11,151,077 by 37 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file.
-
“edges.RData” - regulatory edge information, includes a 11,151,077 by 3 dataframe with three columns: reg (the transcription factor's gene IDs), tar (the target gene IDs), prior (information from “prior” network).
-
"PORCUPINE.zip" - PORCUPINE R package
-
REACTOME directory contains the following:
-
list_of_pathways.txt – This file maps the REACTOME identifier to a pathway name and corresponding species.
- pathways_hierarchy.txt – Pathways hierarchy relationship, consists of two columns of REACTOME identifier, defining the relationship between pathways within the pathway hierarchy. The first column provides the parent pathway stable id, and the second column provides the child pathway stable identifier.
- reactome_pathways_hsa_id.txt - This file maps the REACTOME stable identifier to a REACTOME pathway name used in .gmt file.
The input data used for modeling regulatory networks with PANDA and LIONESS is available in “/networks/input/”
-
“prior.txt” - Prior information on potential regulatory interactions, obtained from scanning known TF motifs to promoter regions in the human genome. This prior network was previously published in Lopes-Ramos et al. 2021 Cancer Research (PMID 34493595). This file contains an 11,151,077 by 3 dataframe, where the first column is the transcription factor's gene IDs, the second column is the target gene IDs and the third column shows the presence (1) or absence (0) of a motif of a TF in a promoter region of a gene.
-
“exp_r.txt” - Gene expression data, contains a 17,899 by 11,322 dataframe including normalized expression data for each sample. The first column is a gene ID. The order of columns corresponds to the first column in samples.txt file.
-
“samples.txt” – Samples, corresponding to the columns of the exp_r.txt. The first 243 samples are sarcoma samples.
-
“ppi.txt” – protein-protein interactions between TFs obtained from StringDb (https://string-db.org/), as in Lopes-Ramos et al. 2021 (PMID 34493595). The file contains a 80,037 by 3 dataframe with three columns, where the first two columns contain protein IDs and the third column contains a score for each interaction.
-
Instructions how to model PANDA and LIONESS networks are provided in README.txt file in “/networks/input/” directory. Additionally, the output of PANDA is provided in “/networks/mat/”.
Notes
Files
GN_ensemblID_symbol.txt
Files
(7.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:9c5461efe01f3b91dd9807b7c5bfe458
|
845.5 MB | Download |
|
md5:6e7d8fb914134f606ccef7cb23d4d10e
|
1.8 GB | Download |
|
md5:079432ef5b4562886a2c00038b6579fb
|
2.4 kB | Download |
|
md5:d674301b8ca599e459eb307fa0d594ca
|
3.4 MB | Download |
|
md5:2942fe1191386790bb8f0aa55f909c27
|
1.6 MB | Preview Download |
|
md5:b4262ebdd6e94b55a3a1e5c7bb70dcbc
|
45.7 MB | Preview Download |
|
md5:60e18eb7a7d1e595aa99af397e9e4717
|
7.0 MB | Preview Download |
|
md5:579e14f26dca6bc92106da1176e8d1d5
|
3.2 GB | Preview Download |
|
md5:b1cea338e76df320c6c30dc648883807
|
141.0 kB | Preview Download |
|
md5:ce06f5c227336c4790da25cc844088b6
|
5.2 kB | Download |
|
md5:d680a944fad878b284031917e27b1111
|
141.0 kB | Preview Download |
|
md5:9e6d525943e9ec0c67744f347b7a444b
|
290.2 kB | Preview Download |
|
md5:6719c789cea83fbd1e6a176cbd78a364
|
7.3 kB | Download |
|
md5:0e2b61c0512d12b0287793e619f5b8f1
|
484.1 kB | Preview Download |
|
md5:4419fc3dab97b6ad91e5a38d83e9648a
|
1.6 GB | Download |
|
md5:929facb2838bb19d6956e1b10d0e978b
|
76.2 kB | Preview Download |
|
md5:be811e1c5b583a0c4963758eda2fe12f
|
2.5 kB | Download |