Published November 2, 2022 | Version v3
Dataset Open

The impact of genetically controlled splicing on exon inclusion and protein structure

  • 1. Columbia University
  • 2. KTH University
  • 3. New York Genome Center

Description

This repository contains raw and processed files used in Einson et. al 2022.

Code used to generate these files can be found here: https://github.com/jeinson/sqtl_manuscript

Descriptions of files contained within each sub directory

01_raw_psi

  • {GTEx_tissue_id}_v8.psi.tsv.gz: Unfiltered PSI output from IPSA-nf, per tissue. See methods for details about how files were created. 
  • gtex_v8_exon_id_map.tsv: Mapping file between exon coordinates and Ensembl gene IDs, with suffix used in GTEx v8 gencode annotation. 

02_qtl_results

  • cross_tissue
    • top_sQTLs_MAF05.tsv: List of top GTEx v8 sQTLs across tissues, with one exon and top variant per tissue. See methods for details. See matching file for column descriptions. 
    • top_sQTLs_median_psi.tsv: The median, mean, and standard deviation of PSI of each significant exon from the previous file, taken across all individuals from GTEx with data available.
    • top_sQTLs_MAF05_w_anc_allele.tsv: List of top sQTLs across tissues, with additional columns for the top ψQTL ancestral and derived alleles, where available. 
  • per_tissue
    • {GTEx_tissue_id}_combined_sQTLs.tsv.gz: Raw output of ψQTL calling using QTLtools in grouped permutational mode per tissue, with groups specified by gene. See methods for more details, and https://qtltools.github.io/qtltools/ for column descriptions. 

03_qtl_credible_sets

  •  GTEx_psi_{GTEx_tissue_id}.collapsed.txt.gz: Output of the QTL catalog fine mapping pipeline (https://github.com/eQTL-Catalogue/qtlmap), run on all exons and tissues, and collapsed using the procedure described in Methods. 

04_qtl_coloc

  • combined_coloc_results_full.tsv.gz: Combined output of running coloc on ψQTLs from the 18 GTEx tissues against 87 sets of GWAS summary statistics. This file contains all results, including non-significant associations. A nominal QTLtools pass was used as input. We do not include these files in this repository due to size limitations, but contact the authors if you need access to nominal QTL calls. 
  • top_sQTLs_with_top_coloc_event.tsv: The QTLs in top_sQTLs_MAF05.tsv with additional columns for the GWAS with the highest posterior probability of a colocalization event. Importantly, the tissue and top variant may not match the main top_sQTLs_MAF05.tsv file for every gene. 

05_exon_features: See matching files for description of each column. 

  • cross_tissue_constitutive_exons_with_AF.tsv: Detailed features of cross tissue constitutive exons. See methods for definition of constitutive exons. 
  • cross_tissue_nonsignificant_genes_with_AF.tsv: Detailed features of sufficiently variable exons with no significant variant across tissues. See methods for more details. 
  • top_sQTLs_MAF05_with_AF.tsv: Detailed features of top sQTLs. 
  • top_sQTLs_with_top_coloc_with_AF.tsv: Detailed features of sQTLs that colocalize with at least one GWAS trait. Contains columns for Euclidean distances between PAE matrices and RMSD between isoforms, among genes with a significant GWAS colocalization event. 

06_predicted_structures: Each prediction was run 5 times, and we report the best model in the manuscript. 

  • {protein.id}[_mutant].result
    • {protein.id}[_mutant]{_run.id}_coverage.png.gz: Plot of the number of sequences per position in MSA
    • {protein.id}[_mutant]{_run.id}_PAE.png.gz: PAE matrix plots for each model
    • {protein.id}[_mutant]{_run.id}_plddt.png.gz: pLDDT plots for each model
    • {protein.id}[_mutant]{_run.id}_predicted_aligned_error_v1.json.gz: A PAE matrix for the best model using AlphaFold-DB's format
    • {protein.id}[_mutant]{_run.id}_unrelaxed_rank_{rank.num}_model_{model.num}_scores.json.gz: Per model array (list of lists) with PAE, a list of the average pLDDT and the pTM score. 
    • {protein.id}[_mutant]{_run.id}_unrelaxed_rank_{rank.num}_model_{model.num}_pdb.gz: Per model predicted structure in pd format
    • {protein.id}[_mutant]{_run.id}.a3m.gz: A3M formatted input MSA
    • cite.bibtex: BibTex file with citations for all used tools and databases
    • config.json: Model input parameters

07_other_data

  • cross_tissue_constitutive_exons.tsv: List of exons that are constitutively spliced across multiple tissues. See methods for details. 
  • cross_tissue_nonsignificant_genes.tsv: List variably spliced exons with no significant sVariant in any tissue. See methods for details. 
  • gtex_v8_exon_id_map.rds: rds representation of a map between exon IDs, as used in the modified version of gencode v26, and exon hg38 coordinates. 
  • gtex_v8_n_exons_per_gene.tsv: Number of exons per gene, as annotated in the modified version of gencode v26 used in GTEx v8. 

08_geuvadis

  • geuvadis_psi.tsv.gz: Unfiltered PSI output from IPSA-nf, run on Geuvadis BAM files. See methods for details. (Raw data was downloaded from ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/GEUV)
  • geuvadis_sQTLs.tsv.gz: Raw output of ψQTL calling using QTLtools in grouped permutational mode for geuvadis data, with groups specified by gene. See methods for more details, and https://qtltools.github.io/qtltools/ for column descriptions. 
  • remapped_gencode.v26.GRCh37.GTEx_v8.nochr.genes.gtf.gz: Lifted over version of the gencode v26 gtf file, used to define exons for PSI and qtl mapping in the geuvadis analysis. The original version that was used in the GTEx analysis is based on GRCh38, and is available here: https://storage.googleapis.com/gtex_analysis_v8/reference/gencode.v26.GRCh38.genes.gtf

Files

01_raw_psi.zip

Files (2.8 GB)

Name Size Download all
md5:c720e98079eb72bcaa4382ba02497c8a
612.6 MB Preview Download
md5:5b054fb68c0410aa6e494d5e7fc2079b
10.3 MB Preview Download
md5:2e44ce38c34e098130224a9f0f58a384
2.5 MB Preview Download
md5:05171f0a469209dcd75128695c75045f
261.0 MB Preview Download
md5:a4607968ef0d69d47db0055cad901057
4.5 MB Preview Download
md5:f84ca22981e96633dfc0fb794ba6299c
1.9 GB Preview Download
md5:9d24a0a6e2f70f4ac3262ab500b95229
8.4 MB Preview Download
md5:4cc85959c95c82395f721e68dc03cbc2
35.1 MB Preview Download