Scalable mixed model approaches for set-based association studies on large-scale categorical data analysis and its application to 450k exome sequencing data in UK Biobank
Creators
Description
The ongoing release of large-scale sequencing data in the UK Biobank allows for identifying associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a novel method for rare-variant association tests, POLMM-GENE, in which a proportional odds logistic mixed model was used to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole exome-sequencing data for 5 ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.
Files
Files
(684.8 MB)
Name | Size | Download all |
---|---|---|
md5:95b65ca91e3db0a99045a2fcdbe0a2b6
|
141.6 MB | Download |
md5:b921da1c6f788fa60d3da02508a73f61
|
13.2 MB | Download |
md5:e532147756c5b3d6215d806570b7f506
|
135.9 MB | Download |
md5:3c60e50e85e165b30a2be27ce74ab790
|
113.4 MB | Download |
md5:358adaa1201f96ba642093d80d156c22
|
140.3 MB | Download |
md5:41eae3a6c913fa998c626164c5fabd72
|
140.4 MB | Download |