Multi-ancestry transcriptome prediction with functionally informed variants in TOPMed MESA improves performance of transcriptome-wide association studies

Hu, Xiaowei

doi:10.5281/zenodo.18644222

Published December 9, 2024 | Version v2

Dataset Open

Multi-ancestry transcriptome prediction with functionally informed variants in TOPMed MESA improves performance of transcriptome-wide association studies

Hu, Xiaowei (Contact person)¹

1. University of Virginia

This Zenodo file collection includes three developed transcriptome prediciton models with funtionally informed variants (FIVs) by using TOPMed MESA multi-ancestry participants with TOPMed Freeze 8 whole-genome sequencing (WGS) data and RNA-seq data from peripheral blood mononuclear cells (PBMCs). These prediction models can be used for transcriptome-wide association study (TWAS) analysis by integrating with GWAS summary statistics.

EN-FM: Elastic Net with Fine-Mapped variants. We first used code "SuSiE_fine_mapping.R" to perform SuSiE fine-mapping (PMID:37220626) on 1,287 TOPMed MESA multi-ancestry participants for each gene to get fine-mapped variants. We then built EN models on fine-mapped variants by using "EN-FM_model.R". The examples of input data for models are provided here https://github.com/hakyimlab/PredictDB-Tutorial.

PUMICE: Prediction Using Models Informed by Chromatin conformation and Epigenomics (PMID:35672318). 3D genomic data and epigenomic annotation from EBV-transformed lymphocytes were used to construct PUMICE models. We built PUMICE models by following code provided in PUMICE Github (https://github.com/ckhunsr1/PUMICE). More specifically, we first ran code "PUMICE_nested_cv.sh" to find out optimal values of parameters for each gene, and then we ran code "PUMICE_compute_weights.sh" to get weights of SNPs included in the model for each gene. You will need to follow the instructions on PUMICE GitHub to install required R packages first before using the code mentioned above. The examples of input data for models are provided here https://github.com/ckhunsr1/PUMICE/blob/master/examples/example_input.zip.

PUMICE-FM: PUMICE with Fine-Mapped variants. The PUMICE-FM model is a variation of PUMICE model, which replaces epigenomic annotation with fine-mapping data and replaces 3D genome window with a constant window size (e.g., 250kb). The fine-mapped variants used for PUMICE-FM models are the same fine-mapped variants from SuSiE fine-mapping described above. The procedure to build PUMICE-FM model is similar to that for PUMICE model. More specifically, we first ran code "PUMICE-FM_nested_cv.sh" to find out optimal values of parameters for each gene, and then we ran code "PUMICE-FM_compute_weights.sh" to get weights of SNPs included in the model for each gene.

TWAS: We applied Summary-PrediXcan (S-PrediXcan, PMID:29739930) pipeline to integrate our prediction models with publicly available GWAS summary statistics to get TWAS results. Before running our TWAS code, please follow the S-PrediXcan tutorial (https://github.com/hakyimlab/MetaXcan/wiki/S-PrediXcan-Command-Line-Tutorial) to install required software. We first ran "TWAS.sh" to get TWAS results for each GWAS trait via S-PrediXcan framework. Then we applied "postTWAS.R" to do follow up analyses based on TWAS results, including correcting TWAS inflation and conducting Omnibus approach.

Figures: We provided code "Figures.R" below for generating our main figures.

The code files mentioned above and model files are provided below for your information.

The maunscript is accepted in priciple at AJHG. The preprint is here https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5194962.

Files

Files (304.6 MB)

Name	Size	Download all
EN-FM_covariance.txt.gz md5:cef56000a48c477c16f36ea776ed1566	3.3 MB	Download
EN-FM_model.db md5:f0a7f5ca57b9ff5e9d25903b6f366ee5	6.7 MB	Download
EN-FM_model.R md5:edb7017238e958c07770ccca13e87d53	13.6 kB	Download
Figures.R md5:43a31e190121df2a5949c0f9ed8e9064	4.4 kB	Download
postTWAS.R md5:955807fb5d829367e94859705fe94aa9	821 Bytes	Download
PUMICE-FM_compute_weights.sh md5:1ebb1dee90ee8d67dddc594bcefe7e4d	354 Bytes	Download
PUMICE-FM_covariance.txt.gz md5:297be5aa86e359a29815e7e8ebfb6426	107.3 MB	Download
PUMICE-FM_model.db md5:67e3914f1fb79355bf502e6658826654	28.4 MB	Download
PUMICE-FM_nested_cv.sh md5:14a2c03410399e5f63255dfba8b120c8	557 Bytes	Download
PUMICE.compute_weights.R md5:e4e4cdfee4328a92fcbc329b80d8e4d9	28.8 kB	Download
PUMICE.nested_cv.R md5:db9d36d0d9c0fd164135c41fe49f84fd	30.5 kB	Download
PUMICE_compute_weights.sh md5:c6c5e3ac65081124219355b9f58fa865	536 Bytes	Download
PUMICE_covariance.txt.gz md5:c1fbb98bd1276450e04fdcdaec244537	124.9 MB	Download
PUMICE_model.db md5:2d668444ce40b84409ca4d8dee016d0f	33.9 MB	Download
PUMICE_nested_cv.sh md5:1d27eefb236af7365f82d62f49935622	633 Bytes	Download
SuSiE_fine_mapping.R md5:e7a81da3d6efd049eda93a65cdab8eea	629 Bytes	Download
TWAS.sh md5:523dcb8df396193692ef6bd9f4ebd2cb	525 Bytes	Download

Additional details

National Institutes of Health
R01-HL153248
National Institutes of Health
R01-ES036042

Copyrighted: 2026-02-14

	All versions	This version
Views	67	11
Downloads	110	2
Data volume	4.9 GB	4.9 kB

Files (304.6 MB)

Funding

Dates

Multi-ancestry transcriptome prediction with functionally informed variants in TOPMed MESA improves performance of transcriptome-wide association studies

Authors/Creators

Description

Files

Files (304.6 MB)

Additional details

Funding

Dates