Published September 14, 2023 | Version 0.0.1
Dataset Open

A scalable, accurate, and universal analysis framework using individual-level allele frequency for large-scale genetic association studies in an admixed population

  • 1. Peking University

Contributors

Contact person:

  • 1. Peking University

Description

Inclusion of individuals with diverse or admixed genetic ancestries is crucial to discover novel findings that may be missed by genomics analyses rooted solely in Caucasian population. Here, we present an analysis framework, SPAmix, which is scalable to a large-scale biobank data analysis including hundreds of thousands of admixed individuals and is universally applicable to various types of complex traits including binary trait, quantitative trait, time-to-event trait, longitudinal traits, etc. For each genetic variant, SPAmix uses genotype data and genetic principal components (PCs) to estimate individual-level allele frequency, which is subsequently used to calibrate p values via a retrospective analysis. A hybrid strategy including saddlepoint approximation (SPA) can greatly increase the accuracy to analyze rare genetic variants, especially if the phenotypic distribution is unbalanced or extremely unbalanced. Compared to Tractor, SPAmix does not require local ancestry information and can be straightforwardly applicable to a multi-way admixed population. Meanwhile, SPAmix can also be extended to SPAmixlocal in which the local ancestry can be incorporated if available. In addition, we propose SPAmixCCT to combine the p values of SPAmix and SPAmixlocal via Cauchy combination (CCT). SPAmixlocal performs close to Tractor when analyzing quantitative traits and is more accurate when analyzing binary traits with an unbalanced case-control ratio. And SPAmixCCT is an optimal unified approach for various cross-ancestry genetic architectures. Extensive simulation studies and real data analyses of 369,314 UK Biobank individuals from multiple ancestries demonstrated that SPAmix is scalable and can discover novel hits while controlling type I error rates well.

Files

Longitudinal_Beta_G_SPAmix_loci_allTraits.csv

Files (21.6 GB)

Name Size Download all
md5:98f148356f80058a6d704323969bdd32
1.0 GB Download
md5:9e20041b511fe9148af1e5fed93c2527
1.0 GB Download
md5:306da23a66a71f5bacdddbe0a288ea7b
1.0 GB Download
md5:72345a4f430c75b6bf6a81ef6bd16acb
1.0 GB Download
md5:b107eeeb9da233dd37b323d39ed685d8
1.0 GB Download
md5:721e942b895da735db2c0ffec9262c82
1.0 GB Download
md5:03603c22fb83166e4255e3d1bfbb6125
1.0 GB Download
md5:39eca1315fa4a1ebdd4864064fa546d5
1.0 GB Download
md5:f8b6a3a3803eee059effda0c2396d99c
1.0 GB Download
md5:133008987cb9b8131091f56322c1f772
99.6 kB Preview Download
md5:b8f6fb67c580a6870b3b7b661d330f6a
1.0 GB Download
md5:6e83af4b8cb0833f20862c9bbc1019d3
1.0 GB Download
md5:ee3d128159842040ba4968f5a1a3b480
1.0 GB Download
md5:c6925fb9e7548585aaa64babb6b181d1
1.0 GB Download
md5:d24529df3549150a73d387834ad4436d
1.0 GB Download
md5:16928a8a1accff26fe3e82b75e74e827
1.0 GB Download
md5:3f4691517b05648b96a642dde567ee14
1.0 GB Download
md5:f7b2f2b369a45da4b73bc6d41814693c
1.0 GB Download
md5:1ecb9651eccdded4e803efb15d133533
1.0 GB Download
md5:727ece5d379c9215a4e60eeca016eef3
1.0 GB Download
md5:ac75e4a9187a9ce4ca1e9281318d2c50
1.0 GB Download
md5:8a309676707e73f081fabf073d8cc765
28.0 kB Preview Download
md5:5c6f037be47e8f6928ba5e9a50f4de45
1.0 GB Download