Published December 9, 2024 | Version v2
Dataset Open

Multi-ancestry transcriptome prediction with functionally informed variants in TOPMed MESA improves performance of transcriptome-wide association studies

  • 1. ROR icon University of Virginia

Description

This Zenodo file collection includes three developed transcriptome prediciton models with funtionally informed variants (FIVs) by using TOPMed MESA multi-ancestry participants with TOPMed Freeze 8 whole-genome sequencing (WGS) data and RNA-seq data from peripheral blood mononuclear cells (PBMCs). These prediction models can be used for transcriptome-wide association study (TWAS) analysis by integrating with GWAS summary statistics.

EN-FM: Elastic Net with Fine-Mapped variants. We first used code "SuSiE_fine_mapping.R" to perform SuSiE fine-mapping (PMID:37220626) on 1,287 TOPMed MESA multi-ancestry participants for each gene to get fine-mapped variants. We then built EN models on fine-mapped variants by using "EN-FM_model.R". The examples of input data for models are provided here https://github.com/hakyimlab/PredictDB-Tutorial.

PUMICE: Prediction Using Models Informed by Chromatin conformation and Epigenomics (PMID:35672318). 3D genomic data and epigenomic annotation from EBV-transformed lymphocytes were used to construct PUMICE models. We built PUMICE models by following code provided in PUMICE Github (https://github.com/ckhunsr1/PUMICE). More specifically, we first ran code "PUMICE_nested_cv.sh" to find out optimal values of parameters for each gene, and then we ran code "PUMICE_compute_weights.sh" to get weights of SNPs included in the model for each gene. You will need to follow the instructions on PUMICE GitHub to install required R packages first before using the code mentioned above. The examples of input data for models are provided here https://github.com/ckhunsr1/PUMICE/blob/master/examples/example_input.zip.  

PUMICE-FM: PUMICE with Fine-Mapped variants. The PUMICE-FM model is a variation of PUMICE model, which replaces epigenomic annotation with fine-mapping data and replaces 3D genome window with a constant window size (e.g., 250kb). The fine-mapped variants used for PUMICE-FM models are the same fine-mapped variants from SuSiE fine-mapping described above. The procedure to build PUMICE-FM model is similar to that for PUMICE model. More specifically, we first ran code "PUMICE-FM_nested_cv.sh" to find out optimal values of parameters for each gene, and then we ran code "PUMICE-FM_compute_weights.sh" to get weights of SNPs included in the model for each gene.

TWAS: We applied Summary-PrediXcan (S-PrediXcan, PMID:29739930) pipeline to integrate our prediction models with publicly available GWAS summary statistics to get TWAS results. Before running our TWAS code, please follow the S-PrediXcan tutorial (https://github.com/hakyimlab/MetaXcan/wiki/S-PrediXcan-Command-Line-Tutorial) to install required software. We first ran "TWAS.sh" to get TWAS results for each GWAS trait via S-PrediXcan framework. Then we applied "postTWAS.R" to do follow up analyses based on TWAS results, including correcting TWAS inflation and conducting Omnibus approach.

Figures: We provided code "Figures.R" below for generating our main figures.

The code files mentioned above and model files are provided below for your information.

The maunscript is accepted in priciple at AJHG. The preprint is here https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5194962.

Files

Files (304.6 MB)

Name Size Download all
md5:cef56000a48c477c16f36ea776ed1566
3.3 MB Download
md5:f0a7f5ca57b9ff5e9d25903b6f366ee5
6.7 MB Download
md5:edb7017238e958c07770ccca13e87d53
13.6 kB Download
md5:43a31e190121df2a5949c0f9ed8e9064
4.4 kB Download
md5:955807fb5d829367e94859705fe94aa9
821 Bytes Download
md5:1ebb1dee90ee8d67dddc594bcefe7e4d
354 Bytes Download
md5:297be5aa86e359a29815e7e8ebfb6426
107.3 MB Download
md5:67e3914f1fb79355bf502e6658826654
28.4 MB Download
md5:14a2c03410399e5f63255dfba8b120c8
557 Bytes Download
md5:e4e4cdfee4328a92fcbc329b80d8e4d9
28.8 kB Download
md5:db9d36d0d9c0fd164135c41fe49f84fd
30.5 kB Download
md5:c6c5e3ac65081124219355b9f58fa865
536 Bytes Download
md5:c1fbb98bd1276450e04fdcdaec244537
124.9 MB Download
md5:2d668444ce40b84409ca4d8dee016d0f
33.9 MB Download
md5:1d27eefb236af7365f82d62f49935622
633 Bytes Download
md5:e7a81da3d6efd049eda93a65cdab8eea
629 Bytes Download
md5:523dcb8df396193692ef6bd9f4ebd2cb
525 Bytes Download

Additional details

Funding

National Institutes of Health
R01-HL153248
National Institutes of Health
R01-ES036042

Dates

Copyrighted
2026-02-14