Published April 28, 2023 | Version v1
Software Open

tcgaAnalyses

  • 1. The University of Texas MD Anderson Cancer Center

Description

tcgaAnalyses

Scripts to analyze TCGA cancer genomic data.

Script 1

vga_makeBoxPlotRsem.sh - compare RNA-Seq gene expression between tumor and matched normal.

  • Synopsis: vga_makeBoxPlotRsem.sh generates a high-quality png box plot with the mRNA expression data of a given gene for 15 TCGA tumor and normal matched controls suitable for publication upon minimal editing. The number of tumor/normal pairs is limited to those cancer sets with at least 10 normal controls.

  • Usage: vga_makeBoxPlotRsem.sh <GENE_NAME> - where <GENE_NAME> is an official gene name in capital letters.

Example: vga_makeBoxPlotRsem.sh ERCC1

  • Notes: Edit lines 8-11 to load any module required for R and edit DIR0 on line 13 to point to the RNA gene expression files. These files were obtained using TCGA-Assembler v.2.0 and a copy is available at ResearchGate under the project TCGA Analyses (see References). Box plots are drawn according to the list on lines 63-77; to change the ranking, such as plotting according to p-values, change the order on lines 63-77. vga_makeBoxPlotRsem.sh calls vga_pngBoxPlotRsem.R. Options in vga_pngBoxPlotRsem.R that control main aestetic features include y-axis range on line 52 (ylim), p-values (on, off) on line 53 (stats_compare_means), colors for the plots on line 73 (scale_fill_manual), the x-axis line (axis.line.x) on line 71, notch (true, false) on line 46. vga_makeBoxPlotRsem.sh can be scaled-up using vga_submitMpiJob, which is detailed in the directory submitMpi.

Script 2

vga_survivalCurve.sh - make Kaplan-Meier survival curve.

  • Synopsis: vga_survivalCurve.sh generates a png plot for a Kaplan-Meier survival curve for TCGA patients comparing between samples with high (above mean) versus samples with low (below mean) expression for a given gene.

  • Usage: vga_survivalCurve.sh <TCGA_TUMOR> <GENE_NAME> - where TCGA_TUMOR is the TCGA tumor code and GENE_NAME an official gene name, both in capital letters.

Example: vga_survivalCurve.sh KIRC ERCC1

  • Notes: Edit lines 8-11 to load any module required for R. Line 19 launches the vga_spotLight binary (see Script 3), specify its path; the --optFdat option points to the TCGA gene expression files, edit the path. Edit line 21 to point to the TCGA clinical data files. Line 32 calls vga_survival.R, verify its path. The example above will generate a graphic file named kirc_ercc1.png and a text file named survival_ercc1_kirc.out. vga_survivalCurve.sh can be scaled-up using vga_submitMpiJob, which is detailed in the directory submitMpi.

Script 3

vga_geneExprMain.cpp - general utility to process TCGA gene expression and mutation files.

  • Synopsis: Option A is to find a correlation between gene expression of 2 genes Option B is to find a correlation between gene expression of 1 gene and all genes in the dataset Option C is to find a correlation between gene expression of 1 gene from dataset1 and mutations in dataset2 Option D is to find correlations between gene expression of all genes in dataset1 and mutations in dataset2 Option E is to find correlations between gene expression of all genes and mutations in all datasets Option F is to output gene expression data for one gene Option G is to get gene expression of 2 genes for survival curves (used by vga_survivalCurve.sh)

  • Usage and Examples:

Option A: Example: ibrun -n 1 vga_spotLight --optAdat ACC__geneExprT.txt --optAgene1 GRB2 --optAgene2 FGFR2 Output file will be 'ACC_GRB2_FGFR2_expr.txt'

Option B: Example: ibrun -n 1 vga_spotLight --optBdat ACC__geneExprT.txt --optBgene GRB2 Output file will be 'GRB2_toAll_ACC_T.txt'

Option C: Example: ibrun -n 1 vga_spotLight --optCdat1 ACC__geneExpT.txt --optCdat2 ACC__somMutT_geneLevel.txt --optCgene GRB2 Output file will be 'ACC_expr_mutsOne.txt'

Option D: Example: ibrun -n 1 vga_spotLight --optDdat1 ACC__geneExpT.txt --optDdat2 ACC__somMutT_geneLevel.txt --procs 16 Output file will be 'ACC_expr_mutsAll.txt'

Option E: Example: ibrun -n x vga_spotLight --optE expMutAll Output files will be 'ACC_expr_mutsAll.txt ... BLCA_expr_mutsAll.txt ... etc.'

Option F: Example: ibrun -n 1 vga_spotLight --optFdat ACC__geneExprT.txt --optFgene GRB2 Output file will be 'ACC_GRB2_exprOne.txt'

Option G: Example: ibrun -n 1 vga_spotLight --optGdat ACC__geneExprT.txt --optGgene1 GRB2 --optGgene2 FGFR2 Output file will be 'ACC_GRB2_FGFR2_forKM.txt'

  • Notes: Edit Makefile to point to the BOOST library and preload any module required for MPI. Edit lines 96 and 97 of vga_geneExprUsage.hpp to point to the directories containing the gene expression and mutation data. File testStart.sh may be used as a guide to test the vga_spotLight compiled binary.

Notes

With R version 3.5.1 vga_makeBoxPlotRsem.sh may raise the following error: /opt/apps/intel18/impi18_0/Rstats/3.5.1/lib64/R/bin/BATCH: line 60: 78714 Segmentation fault ${R_HOME}/bin/R -f ${in} ${opts} ${R_BATCH_OPTIONS} > ${out} 2>&1 caused by a bug in rlang. This can be fixed by loading a dev version of rlang.en

install.packages("pak", repos = "https://r-lib.github.io/p/pak/dev/")
pak::pkg_install("r-lib/rlang")

Files

README.md

Files (58.4 kB)

Name Size Download all
md5:2d8f41ced7d6ad06d15dd98587413305
934 Bytes Download
md5:00f1c76df3d7db6a781e41bab1bd840a
7.2 kB Preview Download
md5:e73af0349284fc634a7e4a6a81034b0e
1.5 kB Download
md5:682b96fa79503cb9fed510a51312438f
7.1 kB Download
md5:1a8d2e4ad00db10610a64e423c67338f
14.8 kB Download
md5:a23794d1c5d5fad3dbdf7660cdccc024
11.3 kB Download
md5:d6adae37534b4960bbf58938ac757689
3.1 kB Download
md5:1c25f841a66f454fae74f7dd6e09e39c
6.5 kB Download
md5:2ff698fd0ea23c5c55a07eeedd21040f
3.3 kB Download
md5:c9cdd08956f17714e1d4a1f867b0f3b5
995 Bytes Download
md5:63d3e594aa328bfce78c37735057c781
1.6 kB Download

Additional details