Published November 2, 2022
| Version v1
Dataset
Open
The impact of genetically controlled splicing on exon inclusion and protein structure
Authors/Creators
- 1. Columbia University
- 2. KTH University
- 3. New York Genome Center
Description
This repository contains raw and processed files used in Einson et. al 2022.
Descriptions of files contained within each sub directory
01_raw_psi
- {tissue_id}_v8.psi.tsv.gz: Unfiltered PSI output from IPSA-nf, per tissue. See methods for details about how files were created.
- gtex_v8_exon_id_map.tsv: Mapping file between exon coordinates and Ensembl gene IDs, with suffix used in GTEx v8 gencode annotation.
02_qtl_results
- cross_tissue
- top_sQTLs_MAF05.tsv: List of top sQTLs across tissues, with one exon and top variant per tissue. See methods for details. See matching file for column descriptions.
- top_sQTLs_median_psi.tsv: The median, mean, and standard deviation of PSI of each significant exon from the previous file, taken across all individuals from GTEx with data available.
- top_sQTLs_MAF05_w_anc_allele.tsv: List of top sQTLs across tissues, with additional columns for the top ψQTL ancestral and derived alleles, where available.
- per_tissue
- {tissue_id}_combined_sQTLs.tsv.gz: Raw output of ψQTL calling using QTLtools in grouped permutational mode per tissue, with groups specified by gene. See methods for more details, and https://qtltools.github.io/qtltools/ for column descriptions.
03_qtl_credible_sets
- GTEx_psi_{tissue_id}.collapsed.txt.gz: Output of the QTL catalog fine mapping pipeline (https://github.com/eQTL-Catalogue/qtlmap), run on all exons and tissues, and collapsed using the procedure described in Methods.
04_qtl_coloc
- combined_coloc_results_full.tsv.gz: Combined output of running coloc on ψQTLs from the 18 GTEx tissues against 87 sets of GWAS summary statistics. This file contains all results, including non-significant associations. A nominal QTLtools pass was used as input. We do not include these files in this repository due to size limitations, but contact the authors if you need access to nominal QTL calls.
- top_sQTLs_with_top_coloc_event.tsv: The QTLs in top_sQTLs_MAF05.tsv with additional columns for the GWAS with the highest posterior probability of a colocalization event. Importantly, the tissue and top variant may not match the main top_sQTLs_MAF05.tsv file for every gene.
05_exon_features: See matching files for description of each column.
- cross_tissue_constitutive_exons_with_AF.tsv: Detailed features of cross tissue constitutive exons. See methods for definition of constitutive exons.
- cross_tissue_nonsignificant_genes_with_AF.tsv: Detailed features of sufficiently variable exons with no significant variant across tissues. See methods for more details.
- top_sQTLs_MAF05_with_AF.tsv: Detailed features of top sQTLs.
- top_sQTLs_with_top_coloc_with_AF.tsv: Detailed features of sQTLs that colocalize with at least one GWAS trait. Contains columns for Euclidean distances between PAE matrices and RMSD between isoforms, among genes with a significant GWAS colocalization event.
06_predicted_structures: Each prediction was run 5 times, and we report the best model in the manuscript.
- {protein.id}[_mutant].result
- {protein.id}[_mutant]{_run.id}_coverage.png.gz: Plot of the number of sequences per position in MSA
- {protein.id}[_mutant]{_run.id}_PAE.png.gz: PAE matrix plots for each model
- {protein.id}[_mutant]{_run.id}_plddt.png.gz: pLDDT plots for each model
- {protein.id}[_mutant]{_run.id}_predicted_aligned_error_v1.json.gz: A PAE matrix for the best model using AlphaFold-DB's format
- {protein.id}[_mutant]{_run.id}_unrelaxed_rank_{rank.num}_model_{model.num}_scores.json.gz: Per model array (list of lists) with PAE, a list of the average pLDDT and the pTM score.
- {protein.id}[_mutant]{_run.id}_unrelaxed_rank_{rank.num}_model_{model.num}_pdb.gz: Per model predicted structure in pd format
- {protein.id}[_mutant]{_run.id}.a3m.gz: A3M formatted input MSA
- cite.bibtex: BibTex file with citations for all used tools and databases
- config.json: Model input parameters
07_other_data
- cross_tissue_constitutive_exons.tsv: List of exons that are constitutively spliced across multiple tissues. See methods for details.
- cross_tissue_nonsignificant_genes.tsv: List variably spliced exons with no significant sVariant in any tissue. See methods for details.
- gtex_v8_exon_id_map.rds: rds representation of a map between exon IDs, as used in the modified version of gencode v26, and exon hg38 coordinates.
- gtex_v8_n_exons_per_gene.tsv: Number of exons per gene, as annotated in the modified version of gencode v26 used in GTEx v8.
Files
01_raw_psi.zip
Files
(2.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c720e98079eb72bcaa4382ba02497c8a
|
612.6 MB | Preview Download |
|
md5:5b054fb68c0410aa6e494d5e7fc2079b
|
10.3 MB | Preview Download |
|
md5:66f5295dc5b71864b151f008584c0650
|
2.3 MB | Preview Download |
|
md5:05171f0a469209dcd75128695c75045f
|
261.0 MB | Preview Download |
|
md5:a4607968ef0d69d47db0055cad901057
|
4.5 MB | Preview Download |
|
md5:f84ca22981e96633dfc0fb794ba6299c
|
1.9 GB | Preview Download |
|
md5:9d24a0a6e2f70f4ac3262ab500b95229
|
8.4 MB | Preview Download |