Published June 9, 2024 | Version v2
Dataset Open

Dataset 2. Variant Distribution of Majority Molecular Phenotype Classifications Relative to All Classifications

  • 1. CytoGnomix Inc
  • 2. Western University
  • 3. Western University, CytoGnomix Inc

Description

A variant may occur in multiple individuals, tissue types, or splice sites. A consensus (majority) molecular phenotype classification for the variant is displayed on ValidSpliceMut, generated by observing all affected cases and associated classifications. For each expression validated variant, the phenotypes are classified as per Figure 2 of Shirley et al. 2019  (https://doi.org/10.12688/f1000research.17204.3). The bins on the horizontal axis of these histograms represent the count of splicing variants with the most frequent classification divided by the total number of cases affected by a variant. Bar height represents the number of variants having the majority classification within the appropriate bin. 

Below each histogram is a table that describes the distribution of majority molecular phenotypes in each bin. The three bins consist of mutations designated as variants that cause allele-specific alternative splicing (category 1), those that are likely aberrant (category 2) and mutations that are aberrant (category 3). For a full description of how these molecular phenotype classifications are determined, see Shirley et al. 2019. Figure 2 of the paper describes the algorithm we use to ensure that variants with no effect on splicing have been eliminated previous to this analysis.

To gain more information about the mutations in a particular bin, an  data file has been provided for each of these figures. To identify which mutations occur in each bin, sort the data file in order of the `%Mol_Phenotype` column which was used to designate histogram bins. Once a mutation of interest has been found, download Dataset 1 on Zenodo (Filename: ‘ValidSpliceMut_TabDelimited_v4.txt’) and search for the mutation (mutation names will be identical). The mutation will appear in Dataset for however many patients the mutation was identified in (from the TCGA and ICGC Cancer Projects). From Dataset 1, users can acquire the mutation’s associated gene, its position within a gene, the Information-Theory based prediction that the mutation would have  on splicing, and the given evidence of splicing using the Veridical algorithm. The latter provides how many reads were found in RNAseq data that supported an aberrant splicing event (exon skipping, intron inclusion, etc).

Files/archives provided:

1 Supplementary figure to Shirley et al. F1000Research 2019, 7:1908. Dataset 2 contains six histograms and supporting tables (generated with GraphPad Prism v6.0). A variant may occur in multiple individuals, tissue types, or splice sites. Histogram bins show the fraction of  splicing molecular phenotype classifications relative to the major consensus phenotype among all cases for variants that are present  in at least 2, 3, 5, 10, 15, and 20 different tissue types.

2. Aggregated variant data/consensus phenotypes used to generate histograms, including tissue types.

3.  Software to generate histograms from variants validated in Dataset 1 (DOI 10.5281/zenodo.1488210).

Files

Dataset 2. Aggregated variant data and consensusphenotypes used to generate histograms.zip

Additional details

Related works

Is published in
Publication: https://doi.org/10.12688/f1000research.17204.3 (URL)

References