Dataset 2. Variant Distribution of Majority Molecular Phenotype Classifications Relative to All Classifications

Shirley, BC; Mucaki, EJ; Rogan, PK

doi:10.5281/zenodo.11541211

Published June 9, 2024 | Version v2

Dataset Open

Dataset 2. Variant Distribution of Majority Molecular Phenotype Classifications Relative to All Classifications

1. CytoGnomix Inc
2. Western University
3. Western University, CytoGnomix Inc

A variant may occur in multiple individuals, tissue types, or splice sites. A consensus (majority) molecular phenotype classification for the variant is displayed on ValidSpliceMut, generated by observing all affected cases and associated classifications. For each expression validated variant, the phenotypes are classified as per Figure 2 of Shirley et al. 2019 (https://doi.org/10.12688/f1000research.17204.3). The bins on the horizontal axis of these histograms represent the count of splicing variants with the most frequent classification divided by the total number of cases affected by a variant. Bar height represents the number of variants having the majority classification within the appropriate bin.

Below each histogram is a table that describes the distribution of majority molecular phenotypes in each bin. The three bins consist of mutations designated as variants that cause allele-specific alternative splicing (category 1), those that are likely aberrant (category 2) and mutations that are aberrant (category 3). For a full description of how these molecular phenotype classifications are determined, see Shirley et al. 2019. Figure 2 of the paper describes the algorithm we use to ensure that variants with no effect on splicing have been eliminated previous to this analysis.

To gain more information about the mutations in a particular bin, an data file has been provided for each of these figures. To identify which mutations occur in each bin, sort the data file in order of the `%Mol_Phenotype` column which was used to designate histogram bins. Once a mutation of interest has been found, download Dataset 1 on Zenodo (Filename: ‘ValidSpliceMut_TabDelimited_v4.txt’) and search for the mutation (mutation names will be identical). The mutation will appear in Dataset for however many patients the mutation was identified in (from the TCGA and ICGC Cancer Projects). From Dataset 1, users can acquire the mutation’s associated gene, its position within a gene, the Information-Theory based prediction that the mutation would have on splicing, and the given evidence of splicing using the Veridical algorithm. The latter provides how many reads were found in RNAseq data that supported an aberrant splicing event (exon skipping, intron inclusion, etc).

Files/archives provided:

1 Supplementary figure to Shirley et al. F1000Research 2019, 7:1908. Dataset 2 contains six histograms and supporting tables (generated with GraphPad Prism v6.0). A variant may occur in multiple individuals, tissue types, or splice sites. Histogram bins show the fraction of splicing molecular phenotype classifications relative to the major consensus phenotype among all cases for variants that are present in at least 2, 3, 5, 10, 15, and 20 different tissue types.

2. Aggregated variant data/consensus phenotypes used to generate histograms, including tissue types.

3. Software to generate histograms from variants validated in Dataset 1 (DOI 10.5281/zenodo.1488210).

Files

Dataset 2. Aggregated variant data and consensusphenotypes used to generate histograms.zip

Files (4.0 MB)

Name	Size	Download all
Dataset 2. Aggregated variant data and consensusphenotypes used to generate histograms.zip md5:daa38d709bd25f46d1d8b21d9b10dadc	3.7 MB	Preview Download
Dataset_2 Histograms and accompanying tables .updated.pdf md5:4a574a0f4b4ed24d9423cf5935eb02bf	363.2 kB	Preview Download
Software to generate Dataset 2.zip md5:20fb4b0b26856be8e1f47f096e62bf35	23.8 kB	Preview Download

Additional details

Is published in: Publication: https://doi.org/10.12688/f1000research.17204.3 (URL)

Shirley BC, Mucaki EJ and Rogan PK. Pan-cancer repository of validated natural and cryptic mRNA splicing mutations [version 3]. F1000Research 2019, 7:1908 (https://doi.org/10.12688/f1000research.17204.3)

	All versions	This version
Views	358	47
Downloads	218	166
Data volume	118.6 MB	104.0 MB

Dataset 2. Variant Distribution of Majority Molecular Phenotype Classifications Relative to All Classifications

Files

Dataset 2. Aggregated variant data and consensusphenotypes used to generate histograms.zip

Files (4.0 MB)

Additional details

Related works

References

Dataset 2. Variant Distribution of Majority Molecular Phenotype Classifications Relative to All Classifications

Creators

Description

Files

Dataset 2. Aggregated variant data and consensusphenotypes used to generate histograms.zip

Files (4.0 MB)

Additional details

Related works

References