Multi-ancestry transcriptome prediction with functionally informed variants in TOPMed MESA improves performance of transcriptome-wide association studies
Description
This Zenodo file collection includes three developed transcriptome prediciton models with funtionally informed variants (FIVs) by using TOPMed MESA multi-ancestry participants with TOPMed Freeze 8 whole-genome sequencing (WGS) data and RNA-seq data from peripheral blood mononuclear cells (PBMCs). These prediction models can be used for transcriptome-wide association study (TWAS) analysis by integrating with GWAS summary statistics.
EN-FM: Elastic Net with Fine-Mapped variants. We first used code "SuSiE_fine_mapping.R" to perform SuSiE fine-mapping (PMID:37220626) on 1,287 TOPMed MESA multi-ancestry participants for each gene to get fine-mapped variants. We then built EN models on fine-mapped variants by using "EN-FM_model.R". The examples of input data for models are provided here https://github.com/hakyimlab/PredictDB-Tutorial.
PUMICE: Prediction Using Models Informed by Chromatin conformation and Epigenomics (PMID:35672318). 3D genomic data and epigenomic annotation from EBV-transformed lymphocytes were used to construct PUMICE models. We built PUMICE models by following code provided in PUMICE Github (https://github.com/ckhunsr1/PUMICE). More specifically, we first ran code "PUMICE_nested_cv.sh" to find out optimal values of parameters for each gene, and then we ran code "PUMICE_compute_weights.sh" to get weights of SNPs included in the model for each gene. You will need to follow the instructions on PUMICE GitHub to install required R packages first before using the code mentioned above. The examples of input data for models are provided here https://github.com/ckhunsr1/PUMICE/blob/master/examples/example_input.zip.
PUMICE-FM: PUMICE with Fine-Mapped variants. The PUMICE-FM model is a variation of PUMICE model, which replaces epigenomic annotation with fine-mapping data and replaces 3D genome window with a constant window size (e.g., 250kb). The fine-mapped variants used for PUMICE-FM models are the same fine-mapped variants from SuSiE fine-mapping described above. The procedure to build PUMICE-FM model is similar to that for PUMICE model. More specifically, we first ran code "PUMICE-FM_nested_cv.sh" to find out optimal values of parameters for each gene, and then we ran code "PUMICE-FM_compute_weights.sh" to get weights of SNPs included in the model for each gene.
TWAS: We applied Summary-PrediXcan (S-PrediXcan, PMID:29739930) pipeline to integrate our prediction models with publicly available GWAS summary statistics to get TWAS results. Before running our TWAS code, please follow the S-PrediXcan tutorial (https://github.com/hakyimlab/MetaXcan/wiki/S-PrediXcan-Command-Line-Tutorial) to install required software. We first ran "TWAS.sh" to get TWAS results for each GWAS trait via S-PrediXcan framework. Then we applied "postTWAS.R" to do follow up analyses based on TWAS results, including correcting TWAS inflation and conducting Omnibus approach.
Figures: We provided code "Figures.R" below for generating our main figures.
The code files mentioned above and model files are provided below for your information.
The maunscript is accepted in priciple at AJHG. The preprint is here https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5194962.
Files
Files
(304.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:cef56000a48c477c16f36ea776ed1566
|
3.3 MB | Download |
|
md5:f0a7f5ca57b9ff5e9d25903b6f366ee5
|
6.7 MB | Download |
|
md5:edb7017238e958c07770ccca13e87d53
|
13.6 kB | Download |
|
md5:43a31e190121df2a5949c0f9ed8e9064
|
4.4 kB | Download |
|
md5:955807fb5d829367e94859705fe94aa9
|
821 Bytes | Download |
|
md5:1ebb1dee90ee8d67dddc594bcefe7e4d
|
354 Bytes | Download |
|
md5:297be5aa86e359a29815e7e8ebfb6426
|
107.3 MB | Download |
|
md5:67e3914f1fb79355bf502e6658826654
|
28.4 MB | Download |
|
md5:14a2c03410399e5f63255dfba8b120c8
|
557 Bytes | Download |
|
md5:e4e4cdfee4328a92fcbc329b80d8e4d9
|
28.8 kB | Download |
|
md5:db9d36d0d9c0fd164135c41fe49f84fd
|
30.5 kB | Download |
|
md5:c6c5e3ac65081124219355b9f58fa865
|
536 Bytes | Download |
|
md5:c1fbb98bd1276450e04fdcdaec244537
|
124.9 MB | Download |
|
md5:2d668444ce40b84409ca4d8dee016d0f
|
33.9 MB | Download |
|
md5:1d27eefb236af7365f82d62f49935622
|
633 Bytes | Download |
|
md5:e7a81da3d6efd049eda93a65cdab8eea
|
629 Bytes | Download |
|
md5:523dcb8df396193692ef6bd9f4ebd2cb
|
525 Bytes | Download |
Additional details
Funding
- National Institutes of Health
- R01-HL153248
- National Institutes of Health
- R01-ES036042
Dates
- Copyrighted
-
2026-02-14