Published May 10, 2023 | Version v2
Journal article Open

Heterogeneity in the gene regulatory landscape of leiomyosarcoma

Description

We reconstructed gene regulatory networks for 80 TCGA and 37 DKFZ leiomyosarcoma samples, and used these networks as input to PORCUPINE (Principal Components Analysis to Obtain Regulatory Contributions Using Pathway-based Interpretation of Network Estimates) method to identify pathways driving leiomyosarcoma heterogeneity. In short, PORCUPINE combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population.

This repository contains following:

  • “rse_gene.RData” - A RangedSummarizedExperiment-class object for the TCGA RNA-seq data.

  • “LMS_37_readCount.txt”- Raw expression count data for 37 DKFZ-LMS samples. This file contains a 57,820 by 38 dataframe, where the first column is gene ID.

  • “GN_ensemblID_symbol.txt”- A 55,476 by 2 dataframe where the first column is Ensemble ID and the second column is gene symbol, corresponding to features in “LMS_37_readCount.txt”.

  • “preprocessing_and_normalization.R” - R script with preprocessing and normalization workflow of the data.

  • TCGA_sarc_clinical_info.txt – clinical data for TCGA sarcoma samples.

  • pcp_tcga_res.txt – This file contains a 1,454 by 6 dataframe with PORCUPINE results for 80 TCGA-LMS tumors.

  • pcp_dkfz_res.txt – This file contains a 1,454 by 6 dataframe with PORCUPINE results for 37 DKFZ-LMS tumors.

  • combine_networks.R - A script to combine the LIONESS networks in a single file and calculate gene targeting scores.

  •  indegree_scores_206_sarcomas.txt - Gene targeting scores for 206 TCGA sarcoma samples.

  • umap_sarcoma.R script to perform UMAP clustering of 206 TCGA sarcoma samples.

  • pcp_pathways_bubble_plot.R script to reproduce Figure 4

  • “80_tcga_lms_net.RData” - patient-specific gene regulatory networks for 80 TCGA leiomyosarcoma samples. This file contains a 11,151,077 by 80 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file.

  • “37_dkfz_lms_net.RData” - patient-specific gene regulatory networks for 37 DKFZ leiomyosarcoma samples. This file contains a 11,151,077 by 37 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file.

  • “edges.RData” - regulatory edge information, includes a 11,151,077 by 3 dataframe with three columns: reg (the transcription factor's gene IDs), tar (the target gene IDs), prior (information from “prior” network).

  • "PORCUPINE.zip" - PORCUPINE R package

  •  REACTOME directory contains the following:

  1. list_of_pathways.txt – This file maps the REACTOME identifier to a pathway name and corresponding species.

  2. pathways_hierarchy.txt – Pathways hierarchy relationship, consists of two columns of REACTOME identifier, defining the          relationship between pathways within the pathway hierarchy. The first column provides the parent pathway stable id, and the second column provides the child pathway stable identifier.
  3.  reactome_pathways_hsa_id.txt - This file maps the REACTOME stable identifier to a REACTOME pathway name used in  .gmt file. 

The input data used for modeling regulatory networks with PANDA and LIONESS is available in “/networks/input/” 

  •  prior.txt” - Prior information on potential regulatory interactions, obtained from scanning known TF motifs to promoter regions in the human genome. This prior network was previously published in Lopes-Ramos et al. 2021 Cancer Research (PMID 34493595). This file contains an 11,151,077 by 3 dataframe, where the first column is the transcription factor's gene IDs, the second column is the target gene IDs and the third column shows the presence (1) or absence (0) of a motif of a TF in a promoter region of a gene. 

  • exp_r.txt - Gene expression data, contains a 17,899  by 11,322 dataframe including normalized expression data for each sample. The first column is a gene ID.  The order of columns corresponds to the first column in samples.txt file.

  •  “samples.txt” – Samples, corresponding to the columns of the exp_r.txt. The first 243 samples are sarcoma samples.

  •  ppi.txt” – protein-protein interactions between TFs obtained from StringDb (https://string-db.org/), as in Lopes-Ramos et al. 2021 (PMID 34493595). The file contains a 80,037 by 3 dataframe with three columns, where the first two columns contain protein IDs and the third column contains a score for each interaction.

  • Instructions how to model PANDA and LIONESS networks are provided in README.txt file in “/networks/input/” directory. Additionally, the output of PANDA is provided in “/networks/mat/”. 

Notes

This work was supported by the Norwegian Research Council, Helse Sør-Øst, and University of Oslo through the Centre for Molecular Medicine Norway (187615), the Norwegian Research Council (313932), Familien Blix Fond, as well as the Emmy Noether Programme Grant from the German Research Foundation (DFG, No. CH 2302/1-1).

Files

GN_ensemblID_symbol.txt

Files (7.6 GB)

Name Size Download all
md5:9c5461efe01f3b91dd9807b7c5bfe458
845.5 MB Download
md5:6e7d8fb914134f606ccef7cb23d4d10e
1.8 GB Download
md5:079432ef5b4562886a2c00038b6579fb
2.4 kB Download
md5:d674301b8ca599e459eb307fa0d594ca
3.4 MB Download
md5:2942fe1191386790bb8f0aa55f909c27
1.6 MB Preview Download
md5:b4262ebdd6e94b55a3a1e5c7bb70dcbc
45.7 MB Preview Download
md5:60e18eb7a7d1e595aa99af397e9e4717
7.0 MB Preview Download
md5:579e14f26dca6bc92106da1176e8d1d5
3.2 GB Preview Download
md5:b1cea338e76df320c6c30dc648883807
141.0 kB Preview Download
md5:ce06f5c227336c4790da25cc844088b6
5.2 kB Download
md5:d680a944fad878b284031917e27b1111
141.0 kB Preview Download
md5:9e6d525943e9ec0c67744f347b7a444b
290.2 kB Preview Download
md5:6719c789cea83fbd1e6a176cbd78a364
7.3 kB Download
md5:0e2b61c0512d12b0287793e619f5b8f1
484.1 kB Preview Download
md5:4419fc3dab97b6ad91e5a38d83e9648a
1.6 GB Download
md5:929facb2838bb19d6956e1b10d0e978b
76.2 kB Preview Download
md5:be811e1c5b583a0c4963758eda2fe12f
2.5 kB Download