Published September 26, 2022 | Version v1
Dataset Open

Scalable mixed model approaches for set-based association studies on large-scale categorical data analysis and its application to 450k exome sequencing data in UK Biobank

Creators

Contributors

Contact person:

  • 1. Peking University

Description

The ongoing release of large-scale sequencing data in the UK Biobank allows for identifying associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a novel method for rare-variant association tests, POLMM-GENE, in which a proportional odds logistic mixed model was used to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole exome-sequencing data for 5 ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.

Files

Files (684.8 MB)

Name Size Download all
md5:95b65ca91e3db0a99045a2fcdbe0a2b6
141.6 MB Download
md5:b921da1c6f788fa60d3da02508a73f61
13.2 MB Download
md5:e532147756c5b3d6215d806570b7f506
135.9 MB Download
md5:3c60e50e85e165b30a2be27ce74ab790
113.4 MB Download
md5:358adaa1201f96ba642093d80d156c22
140.3 MB Download
md5:41eae3a6c913fa998c626164c5fabd72
140.4 MB Download