GEMINI genome-wide associations study (GWAS) summary statistics v1
Authors/Creators
Description
GEMINI: Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions
GWAS summary statistics for 72 long-term conditions. Up to three sources of genetics data are used, depending on the condition: UK Biobank, FinnGen, and consortium-published meta-analyses (where available).
If you use these resources please cite the below and include the resource release version:
Murrin et al. (2024) A systematic analysis of the contribution of genetics to multimorbidity and comparisons with primary care data. eBioMedicine. https://doi.org/10.1016/j.ebiom.2025.105584
See our GitHub repos for more information: https://github.com/GEMINI-multimorbidity
Summary information
See the `conditions.txt` file for a list of conditions included, plus file suffix, studies included, and effective sample size.
GWAS files are provided in GWAS catalog format, with positions mapped to build 37.
For each GWAS file there is a README, detailing the source of the summary statistics: 1) UK Biobank [UKB], a large population-based prospective study with 450,197 individuals of European genetic ancestry. 2) FinnGen, a large-scale genomics initiative including over 500,000 participants with linked health diagnosis data. 3) Disease-specific GWAS meta-analyses summary statistics when available for each LTC. See the GEMINI GitHub for details on LTC diagnostic codes: https://github.com/GEMINI-multimorbidity
An extract of the methods from Murrin 2025 (https://doi.org/10.1016/j.ebiom.2025.105584) are included below:
UK Biobank data
To perform the genetic analyses we ascertained diagnosis of LTCs using both primary-care linked data (available for 45% of participants, censoring date: 28/02/2016 – Read v2 and CTV3 codes, truncated to 5 bytes) and hospital inpatient diagnoses (available for all participants, censoring date: 31/10/2022 - ICD-10 codes). Participants were genotyped using two near identical (>95% shared variants, n=805,426 total) microarray platforms: the Affymetrix Axiom UK Biobank array (in 438,427 participants) and the Affymetrix UKBiLEVE array (in 49,950 participants). UK Biobank centrally performed genotype imputation in 487,442 participants using data from the Haplotype Reference Consortium and UK10K reference panels, increasing the number of genetic variants to ~96 million.8 We exclude genetic variants with <0.1% minor allele frequency or with imputed INFO score <0.3, leaving ~16 million for GWAS analysis. GWAS were performed in up to 451,197 participants genetically similar to the 1000 Genomes EUR population (described previously.9 In brief, individuals from the UK Biobank were projected into the 1000 Genomes principal component (PC) space using the SNP loadings derived from the initial PC analysis to minimise confounding of PC values due to varying degrees of relatedness within UK Biobank.10 Using the means derived from the 1000 Genomes reference dataset, we subsequently performed K-means clustering analyses to determine which individuals from UK Biobank could be classified as EUR-like. GWAS were performed in UKB participants genetically similar to the 1000 Genomes EUR reference population for 84 LTCs, using the same clinical code lists as above in CPRD, using the REGENIE software (v3.1.3) to account for population structure and relatedness, adjusted for age at baseline assessment, sex, genotyping chip, and assessment centre. 11 For quality control, we restricted variants to those with a minor allele frequency (MAF) of >0.1%, and an imputation INFO score ≥0.3.
FinnGen data
FinnGen is a large-scale genomics initiative, that contains data from over 500,000 participants and is linked to health diagnosis data. GWAS summary statistics from the FinnGen cohort (release 9) with 377,277 participants, provided for predetermined disease (“endpoints”), defined using ICD-10-FM (Finnish Modification). 12
Disease-specific GWAS
Disease-specific GWAS meta-analyses summary statistics when available for each LTC. We used the GWAS Catalog (https://www.ebi.ac.uk/gwas), 13 disease-specific public repositories and contacted authors of the latest GWAS to identify relevant studies with aligned disease definitions and participants of European ancestry to enable comparison with UKB and FinnGen. The below LTCs had available published and available GWAS summary statistics and were used in the genetics analysis (see Supplementary Table 1 for further information).
• Anxiety disorders.14
• Asthma.15
• Atrial fibrillation.16
• Chronic kidney disease.17
• Chronic obstructive pulmonary disease.18
• Coronary heart disease.19
• Depression.20
• Erectile dysfunction.21
• Gastro-oesophageal reflux disease.22
• Glaucoma.23
• Gout.24
• Hearing loss.25
• Heart failure.26
• Hyperthyroidism, hypothyroidism.27
• Irritable bowel syndrome.28
• Migraine.29
• Osteoarthritis.30
• Primary breast malignancy.31
• Rheumatoid arthritis.32
• Schizophrenia, schizotypal and delusional disorders.33
• Type 2 diabetes.34
• Ulcerative colitis.35
GWAS meta-analysis
For the 72 conditions meeting the heritability criteria above, we meta-analysed genome-wide summary data from up to 3 data sources – UKB, FinnGen and disease-specific GWAS (referred to as Consortium data). See Supplementary Figure 2 for analysis flowchart, and Supplementary Table 1 for effective sample size and other information. A cross-trait LD-score regression framework, that estimates the within-condition, between-dataset genetic correlation, measured the similarity between conditions. 40 The FinnGen and Consortium data were added to the meta-analysis when within-condition genetic correlation (R_g) with UK Biobank was >0.8. Where consortium data included UK Biobank or FinnGen data, the consortium data was used to avoid overlapping datasets (i.e., if UKB was in the consortium GWAS, then we only meta-analysed consortium+FinnGen). Studies were meta-analysed using GWAMA. 41
References
1 Amell A, Roso-Llorach A, Palomero L, et al. Disease networks identify specific conditions and pleiotropy influencing multimorbidity in the general population. Sci Rep 2018; 8: 15970.
2 Fadason T, Schierding W, Lumley T, O’Sullivan JM. Chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities. Nat Commun 2018; 9: 5198.
3 Dong G, Feng J, Sun F, Chen J, Zhao X-M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med 2021; 13: 110.
4 Kim S-S, Hudgins AD, Gonzalez B, et al. A Compendium of Age-Related PheWAS and GWAS Traits for Human Genetic Association Studies, Their Networks and Genetic Correlations. Front Genet 2021; 12. DOI:10.3389/fgene.2021.680560.
5 West CE, Karim M, Falaguera MJ, et al. Integrative GWAS and co-localisation analysis suggests novel genes associated with age-related multimorbidity. Sci Data 2023; 10: 655.
6 Recalde M, Rodríguez C, Burn E, et al. Data Resource Profile: The Information System for Research in Primary Care (SIDIAP). Int J Epidemiol 2022; 51: e324–36.
7 Sudlow C, Gallacher J, Allen N, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med 2015; 12: e1001779.
8 Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562: 203–9.
9 Casanova F, Tian Q, Atkins JL, et al. Iron and risk of dementia: Mendelian randomisation analysis in UK Biobank. J Med Genet 2024; : jmg-2023-109295.
10 Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res 2020; 48: D941–7.
11 Mbatchou J, Barnard L, Backman J, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53: 1097–103.
12 Kurki MI, Karjalainen J, Palta P, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613: 508–18.
13 Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 2023; 51: D977–85.
14 Otowa T, Hek K, Lee M, et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol Psychiatry 2016; 21: 1391–9.
15 Olafsdottir TA, Theodors F, Bjarnadottir K, et al. Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. Nat Commun 2020; 11. DOI:10.1038/S41467-019-14144-8.
16 Roselli C, Chaffin MD, Weng LC, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet 2018; 50: 1225–33.
17 Wuttke M, Li Y, Li M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 2019; 51: 957–72.
18 Sakornsakolpat P, Prokopenko D, Lamontagne M, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet 2019; 51: 494–505.
19 Aragam KG, Jiang T, Goel A, et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet 2022; 54: 1803–15.
20 Howard DM, Adams MJ, Clarke TK, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci 2019; 22: 343–52.
21 Bovijn J, Jackson L, Censin J, et al. GWAS Identifies Risk Locus for Erectile Dysfunction and Implicates Hypothalamic Neurobiology and Diabetes in Etiology. Am J Hum Genet 2019; 104: 157–63.
22 An J, Gharahkhani P, Law MH, et al. Gastroesophageal reflux GWAS identifies risk loci that also associate with subsequent severe esophageal diseases. Nat Commun 2019; 10. DOI:10.1038/S41467-019-11968-2.
23 Gharahkhani P, Jorgenson E, Hysi P, et al. Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries. Nat Commun 2021; 12. DOI:10.1038/S41467-020-20851-4.
24 Tin A, Marten J, Halperin Kuhns VL, et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat Genet 2019; 51: 1459–74.
25 Praveen K, Dobbyn L, Gurski L, et al. Population-scale analysis of common and rare genetic variation associated with hearing loss in adults. Commun Biol 2022; 5: 540.
26 Shah S, Henry A, Roselli C, et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat Commun 2020; 11. DOI:10.1038/S41467-019-13690-5.
27 Teumer A, Chaker L, Groeneweg S, et al. Genome-wide analyses identify a role for SLC17A4 and AADAT in thyroid hormone regulation. Nat Commun 2018; 9. DOI:10.1038/S41467-018-06356-1.
28 Eijsbouts C, Zheng T, Kennedy NA, et al. Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders. Nat Genet 2021; 53: 1543–52.
29 Gormley P, Anttila V, Winsvold BS, et al. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nat Genet 2016; 48: 856–66.
30 Boer CG, Hatzikotoulas K, Southam L, et al. Deciphering osteoarthritis genetics across 826,690 individuals from 9 populations. Cell 2021; 184: 4784-4818.e17.
31 Zhang H, Ahearn TU, Lecarpentier J, et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet 2020; 52: 572–81.
32 Saevarsdottir S, Stefansdottir L, Sulem P, et al. Multiomics analysis of rheumatoid arthritis yields sequence variants that have large effects on risk of the seropositive subset. Ann Rheum Dis 2022; 81: 1085–95.
33 Trubetskoy V, Pardiñas AF, Qi T, et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 2022; 604: 502–8.
34 Mahajan A, Spracklen CN, Zhang W, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet 2022; 54: 560–72.
35 De Lange KM, Moutsianas L, Lee JC, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet 2017; 49: 256–61.
36 Denaxas S, Gonzalez-Izquierdo A, Direk K, et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. Journal of the American Medical Informatics Association 2019; 26: 1545–59.
37 Calderón-Larrañaga A, Vetrano DL, Onder G, et al. Assessing and Measuring Chronic Multimorbidity in the Older Population: A Proposal for Its Operationalization. J Gerontol A Biol Sci Med Sci 2016; : glw233.
38 Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, estimation and interpretation of SNP-based heritability. Nature Genetics 2017 49:9 2017; 49: 1304–10.
39 Bulik-Sullivan BK, Loh P-R, Finucane HK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 2015; 47: 291–5.
40 Bulik-Sullivan B, Finucane HK, Anttila V, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 2015; 47: 1236–41.
41 Mägi R, Morris AP. GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 2010; 11: 288.
Files
conditions.txt
Files
(36.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:02944544c7844dda9ef6a58c6d397954
|
8.3 kB | Preview Download |
|
md5:016c632681bc6d9a59744d99b524eb94
|
494.4 MB | Download |
|
md5:f7f0230b5e08162710ef9f0ce601bf59
|
385.3 MB | Download |
|
md5:468237f8dcbd01098ed517c87eb66dab
|
569.1 MB | Download |
|
md5:e34ef8ded330914df1930ae3d599ce26
|
491.4 MB | Download |
|
md5:2ba1bde3d45d2c9dbd92d765dafd0fce
|
490.6 MB | Download |
|
md5:998576ec7b3b332513ba24ec844c4ff6
|
412.9 MB | Download |
|
md5:d02456b90255b9c26ba7871064c89467
|
568.5 MB | Download |
|
md5:6d4e5b0483d14ca557203e13f315385e
|
490.0 MB | Download |
|
md5:e83d9b9cd09fb55065c0cfa05a4912ef
|
569.1 MB | Download |
|
md5:1721a5849f295cb0ef1ca79a9a568328
|
491.4 MB | Download |
|
md5:db9347faa06ac0b9e78ce6b2c34e9833
|
489.7 MB | Download |
|
md5:d54518562f63aacf9d730355e249856c
|
490.7 MB | Download |
|
md5:3b3899fbebb67ad61660c0490b9c9941
|
496.7 MB | Download |
|
md5:4569c83a209fdac79283b3dbf4adc934
|
491.6 MB | Download |
|
md5:7ed1637555d1355310594bef5f5520a8
|
437.0 MB | Download |
|
md5:a7c5713b5c874c6609d7f6b1eda5f9f0
|
221.9 MB | Download |
|
md5:300046df9c16f57a4559209021b5f4a2
|
379.7 MB | Download |
|
md5:4e1b0d91b42825989d5b85b4a50f026e
|
490.7 MB | Download |
|
md5:eec15e5c0c3600e2ab513133eb8400ce
|
568.9 MB | Download |
|
md5:73421fa70dc48b882e2ecd253b9edfb9
|
491.7 MB | Download |
|
md5:057cd9f144c6175947ed851ba414f16d
|
490.6 MB | Download |
|
md5:87d2f120e017af9ec49902aabb9c38dd
|
573.5 MB | Download |
|
md5:77a5683ed02421aaf328795c854da88c
|
568.2 MB | Download |
|
md5:ebf06a2ca2a44d375cdf9b0a0beb9e99
|
490.6 MB | Download |
|
md5:9772a0e799c27d38c281fab32c873423
|
568.8 MB | Download |
|
md5:0929f3cdc1663deedf0c316969020b24
|
568.5 MB | Download |
|
md5:ece8a4683fbef2f6bf05674a713ba0c4
|
573.3 MB | Download |
|
md5:2461b6de6a449490c6911524c4290234
|
497.7 MB | Download |
|
md5:93c14d481fa4557c4d373d24d587cd66
|
378.1 MB | Download |
|
md5:17646179732a68064d5de59fd8f41ec7
|
552.0 MB | Download |
|
md5:e921e9442824ed698564c50f5a08a92c
|
501.6 MB | Download |
|
md5:77419663060121692b7abe9af3bbac05
|
373.1 MB | Download |
|
md5:6cbb7237d16031ee90afe1f8b0330775
|
490.8 MB | Download |
|
md5:323258205187bf992437256fb68609cd
|
492.9 MB | Download |
|
md5:1a11f95aeebc4f2c56f6b8d85d4c452c
|
491.4 MB | Download |
|
md5:0ae043f02e0aef108816b2e4fe7f95ab
|
310.7 MB | Download |
|
md5:23f14a0712f39c6f282e58b26b2b406f
|
570.8 MB | Download |
|
md5:43b62e3240438b8ff36c9efba614a5dc
|
568.9 MB | Download |
|
md5:2697c3d57c87e1bba814bf0b182d37f5
|
490.3 MB | Download |
|
md5:8f2528d4724189752cb7fabee1957259
|
569.1 MB | Download |
|
md5:987258841851f9151f485c505a89c728
|
525.8 MB | Download |
|
md5:1b88eeea164b2c2d99fba7589a426425
|
436.9 MB | Download |
|
md5:62006f469ced86809c9b5fdf36b49297
|
491.4 MB | Download |
|
md5:9ab26a11b239da7d8519919d0a783e26
|
570.5 MB | Download |
|
md5:9b142c1cf489353393446029b3b47ca2
|
569.0 MB | Download |
|
md5:8849a4b0426a801b25d69d395a307848
|
568.6 MB | Download |
|
md5:fe077c0a8f7d0f894c11f12ccf3b5ef1
|
568.6 MB | Download |
|
md5:dad5c02ec6197eda94b2c9b3e37b8a31
|
568.5 MB | Download |
|
md5:56973594c0503eeb31daa227050b74ca
|
403.6 MB | Download |
|
md5:625815e6075214dc814060137edc0de4
|
492.7 MB | Download |
|
md5:bc08d885d86ec923114bbb9419f0bf6b
|
491.3 MB | Download |
|
md5:c0ce5ef968c5ae294a6be059e87f2190
|
491.6 MB | Download |
|
md5:9062ca506dd3ce352d4f3d123eec8ce9
|
492.2 MB | Download |
|
md5:5a00295a31e6929ff16906ff327bcdfe
|
401.0 MB | Download |
|
md5:2e761b1b95791f517f7444e1de0dfd9b
|
568.6 MB | Download |
|
md5:a14d206ea725145015e8b3f8bd7685aa
|
568.7 MB | Download |
|
md5:c7b6a2a02eff8348098ed8186b325210
|
494.2 MB | Download |
|
md5:59e8cc4b8d63ad2d70a127856bc203e6
|
569.1 MB | Download |
|
md5:7809970e9faa672e9fb11e207a9836b0
|
490.7 MB | Download |
|
md5:e087f2afd7576f3c727c4fc3e6fb2153
|
568.5 MB | Download |
|
md5:b4bb2d93e6b507a57c337bdfd9ef789d
|
571.6 MB | Download |
|
md5:21ecca29da3b2f562f1db6388ae7b69d
|
505.9 MB | Download |
|
md5:9bb840c2204f45a4a9add2ce1ff49a44
|
490.7 MB | Download |
|
md5:30b476614dfd88a635ab7fb206ea1084
|
490.4 MB | Download |
|
md5:22726236ceaf7bcb47b88fc465612308
|
491.2 MB | Download |
|
md5:cad5634d4c7b7efb020499d20473d6f4
|
491.5 MB | Download |
|
md5:6759433fe711ef2d8345e5614eb5c98f
|
492.0 MB | Download |
|
md5:d85004e85c4c0b03064c519f3508e49e
|
507.4 MB | Download |
|
md5:8be1a1d90ce44c7158bbcd71b05532b0
|
491.3 MB | Download |
|
md5:05aa3dce4d6bf418bdead2fc9dc50e7d
|
570.4 MB | Download |
|
md5:b59707dffe88102d985df331887ed4e5
|
492.0 MB | Download |
|
md5:6ca31acd691112b4615d885484af12f3
|
491.3 MB | Download |
|
md5:afe271d5e467e73a8e793fd93dabe59b
|
5.7 kB | Download |
Additional details
Funding
- UK Research and Innovation
- Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions - GEMINI MR/V005359/1