Published December 5, 2024 | Version 1.0
Dataset Open

GEMINI genome-wide associations study (GWAS) summary statistics v1

Description

GEMINI: Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions

GWAS summary statistics for 72 long-term conditions. Up to three sources of genetics data are used, depending on the condition: UK Biobank, FinnGen, and consortium-published meta-analyses (where available).

If you use these resources please cite the below and include the resource release version:

Murrin et al. (2024) A systematic analysis of the contribution of genetics to multimorbidity and comparisons with primary care data. eBioMedicine. https://doi.org/10.1016/j.ebiom.2025.105584 

See our GitHub repos for more information: https://github.com/GEMINI-multimorbidity

 

Summary information

See the `conditions.txt` file for a list of conditions included, plus file suffix, studies included, and effective sample size. 

GWAS files are provided in GWAS catalog format, with positions mapped to build 37.

For each GWAS file there is a README, detailing the source of the summary statistics: 1) UK Biobank [UKB], a large population-based prospective study with 450,197 individuals of European genetic ancestry. 2) FinnGen, a large-scale genomics initiative including over 500,000 participants with linked health diagnosis data. 3) Disease-specific GWAS meta-analyses summary statistics when available for each LTC. See the GEMINI GitHub for details on LTC diagnostic codes: https://github.com/GEMINI-multimorbidity 

 

An extract of the methods from Murrin 2025 (https://doi.org/10.1016/j.ebiom.2025.105584) are included below:

 

UK Biobank data
To perform the genetic analyses we ascertained diagnosis of LTCs using both primary-care linked data (available for 45% of participants, censoring date: 28/02/2016 – Read v2 and CTV3 codes, truncated to 5 bytes) and hospital inpatient diagnoses (available for all participants, censoring date: 31/10/2022 - ICD-10 codes). Participants were genotyped using two near identical (>95% shared variants, n=805,426 total) microarray platforms: the Affymetrix Axiom UK Biobank array (in 438,427 participants) and the Affymetrix UKBiLEVE array (in 49,950 participants). UK Biobank centrally performed genotype imputation in 487,442 participants using data from the Haplotype Reference Consortium and UK10K reference panels, increasing the number of genetic variants to ~96 million.8 We exclude genetic variants with <0.1% minor allele frequency or with imputed INFO score <0.3, leaving ~16 million for GWAS analysis. GWAS were performed in up to 451,197 participants genetically similar to the 1000 Genomes EUR population (described previously.9 In brief, individuals from the UK Biobank were projected into the 1000 Genomes principal component (PC) space using the SNP loadings derived from the initial PC analysis to minimise confounding of PC values due to varying degrees of relatedness within UK Biobank.10 Using the means derived from the 1000 Genomes reference dataset, we subsequently performed K-means clustering analyses to determine which individuals from UK Biobank could be classified as EUR-like. GWAS were performed in UKB participants genetically similar to the 1000 Genomes EUR reference population for 84 LTCs, using the same clinical code lists as above in CPRD, using the REGENIE software (v3.1.3) to account for population structure and relatedness, adjusted for age at baseline assessment, sex, genotyping chip, and assessment centre. 11 For quality control, we restricted variants to those with a minor allele frequency (MAF) of >0.1%, and an imputation INFO score ≥0.3.


FinnGen data
FinnGen is a large-scale genomics initiative, that contains data from over 500,000 participants and is linked to health diagnosis data. GWAS summary statistics from the FinnGen cohort  (release 9) with 377,277 participants, provided for predetermined disease (“endpoints”), defined using ICD-10-FM (Finnish Modification). 12 


Disease-specific GWAS
Disease-specific GWAS meta-analyses summary statistics when available for each LTC. We used the GWAS Catalog (https://www.ebi.ac.uk/gwas), 13 disease-specific public repositories and contacted authors of the latest GWAS to identify relevant studies with aligned disease definitions and participants of European ancestry to enable comparison with UKB and FinnGen. The below LTCs had available published and available GWAS summary statistics and were used in the genetics analysis (see Supplementary Table 1 for further information).
•    Anxiety disorders.14
•    Asthma.15
•    Atrial fibrillation.16
•    Chronic kidney disease.17
•    Chronic obstructive pulmonary disease.18
•    Coronary heart disease.19
•    Depression.20
•    Erectile dysfunction.21
•    Gastro-oesophageal reflux disease.22
•    Glaucoma.23
•    Gout.24
•    Hearing loss.25
•    Heart failure.26
•    Hyperthyroidism, hypothyroidism.27
•    Irritable bowel syndrome.28
•    Migraine.29
•    Osteoarthritis.30
•    Primary breast malignancy.31
•    Rheumatoid arthritis.32
•    Schizophrenia, schizotypal and delusional disorders.33
•    Type 2 diabetes.34
•    Ulcerative colitis.35

 

GWAS meta-analysis

For the 72 conditions meeting the heritability criteria above, we meta-analysed genome-wide summary data from up to 3 data sources – UKB, FinnGen and disease-specific GWAS (referred to as Consortium data). See Supplementary Figure 2 for analysis flowchart, and Supplementary Table 1 for effective sample size and other information. A cross-trait LD-score regression framework, that estimates the within-condition, between-dataset genetic correlation, measured the similarity between conditions. 40 The FinnGen and Consortium data were added to the meta-analysis when within-condition genetic correlation (R_g) with UK Biobank was >0.8. Where consortium data included UK Biobank or FinnGen data, the consortium data was used to avoid overlapping datasets (i.e., if UKB was in the consortium GWAS, then we only meta-analysed consortium+FinnGen). Studies were meta-analysed using GWAMA. 41  

 

References
1    Amell A, Roso-Llorach A, Palomero L, et al. Disease networks identify specific conditions and pleiotropy influencing multimorbidity in the general population. Sci Rep 2018; 8: 15970.
2    Fadason T, Schierding W, Lumley T, O’Sullivan JM. Chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities. Nat Commun 2018; 9: 5198.
3    Dong G, Feng J, Sun F, Chen J, Zhao X-M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med 2021; 13: 110.
4    Kim S-S, Hudgins AD, Gonzalez B, et al. A Compendium of Age-Related PheWAS and GWAS Traits for Human Genetic Association Studies, Their Networks and Genetic Correlations. Front Genet 2021; 12. DOI:10.3389/fgene.2021.680560.
5    West CE, Karim M, Falaguera MJ, et al. Integrative GWAS and co-localisation analysis suggests novel genes associated with age-related multimorbidity. Sci Data 2023; 10: 655.
6    Recalde M, Rodríguez C, Burn E, et al. Data Resource Profile: The Information System for Research in Primary Care (SIDIAP). Int J Epidemiol 2022; 51: e324–36.
7    Sudlow C, Gallacher J, Allen N, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med 2015; 12: e1001779.
8    Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562: 203–9.
9    Casanova F, Tian Q, Atkins JL, et al. Iron and risk of dementia: Mendelian randomisation analysis in UK Biobank. J Med Genet 2024; : jmg-2023-109295.
10    Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res 2020; 48: D941–7.
11    Mbatchou J, Barnard L, Backman J, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53: 1097–103.
12    Kurki MI, Karjalainen J, Palta P, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613: 508–18.
13    Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 2023; 51: D977–85.
14    Otowa T, Hek K, Lee M, et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol Psychiatry 2016; 21: 1391–9.
15    Olafsdottir TA, Theodors F, Bjarnadottir K, et al. Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. Nat Commun 2020; 11. DOI:10.1038/S41467-019-14144-8.
16    Roselli C, Chaffin MD, Weng LC, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet 2018; 50: 1225–33.
17    Wuttke M, Li Y, Li M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 2019; 51: 957–72.
18    Sakornsakolpat P, Prokopenko D, Lamontagne M, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet 2019; 51: 494–505.
19    Aragam KG, Jiang T, Goel A, et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet 2022; 54: 1803–15.
20    Howard DM, Adams MJ, Clarke TK, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci 2019; 22: 343–52.
21    Bovijn J, Jackson L, Censin J, et al. GWAS Identifies Risk Locus for Erectile Dysfunction and Implicates Hypothalamic Neurobiology and Diabetes in Etiology. Am J Hum Genet 2019; 104: 157–63.
22    An J, Gharahkhani P, Law MH, et al. Gastroesophageal reflux GWAS identifies risk loci that also associate with subsequent severe esophageal diseases. Nat Commun 2019; 10. DOI:10.1038/S41467-019-11968-2.
23    Gharahkhani P, Jorgenson E, Hysi P, et al. Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries. Nat Commun 2021; 12. DOI:10.1038/S41467-020-20851-4.
24    Tin A, Marten J, Halperin Kuhns VL, et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat Genet 2019; 51: 1459–74.
25    Praveen K, Dobbyn L, Gurski L, et al. Population-scale analysis of common and rare genetic variation associated with hearing loss in adults. Commun Biol 2022; 5: 540.
26    Shah S, Henry A, Roselli C, et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat Commun 2020; 11. DOI:10.1038/S41467-019-13690-5.
27    Teumer A, Chaker L, Groeneweg S, et al. Genome-wide analyses identify a role for SLC17A4 and AADAT in thyroid hormone regulation. Nat Commun 2018; 9. DOI:10.1038/S41467-018-06356-1.
28    Eijsbouts C, Zheng T, Kennedy NA, et al. Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders. Nat Genet 2021; 53: 1543–52.
29    Gormley P, Anttila V, Winsvold BS, et al. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nat Genet 2016; 48: 856–66.
30    Boer CG, Hatzikotoulas K, Southam L, et al. Deciphering osteoarthritis genetics across 826,690 individuals from 9 populations. Cell 2021; 184: 4784-4818.e17.
31    Zhang H, Ahearn TU, Lecarpentier J, et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet 2020; 52: 572–81.
32    Saevarsdottir S, Stefansdottir L, Sulem P, et al. Multiomics analysis of rheumatoid arthritis yields sequence variants that have large effects on risk of the seropositive subset. Ann Rheum Dis 2022; 81: 1085–95.
33    Trubetskoy V, Pardiñas AF, Qi T, et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 2022; 604: 502–8.
34    Mahajan A, Spracklen CN, Zhang W, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet 2022; 54: 560–72.
35    De Lange KM, Moutsianas L, Lee JC, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet 2017; 49: 256–61.
36    Denaxas S, Gonzalez-Izquierdo A, Direk K, et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. Journal of the American Medical Informatics Association 2019; 26: 1545–59.
37    Calderón-Larrañaga A, Vetrano DL, Onder G, et al. Assessing and Measuring Chronic Multimorbidity in the Older Population: A Proposal for Its Operationalization. J Gerontol A Biol Sci Med Sci 2016; : glw233.
38    Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, estimation and interpretation of SNP-based heritability. Nature Genetics 2017 49:9 2017; 49: 1304–10.
39    Bulik-Sullivan BK, Loh P-R, Finucane HK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 2015; 47: 291–5.
40    Bulik-Sullivan B, Finucane HK, Anttila V, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 2015; 47: 1236–41.
41    Mägi R, Morris AP. GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 2010; 11: 288.
 

Files

conditions.txt

Files (36.1 GB)

Name Size Download all
md5:02944544c7844dda9ef6a58c6d397954
8.3 kB Preview Download
md5:016c632681bc6d9a59744d99b524eb94
494.4 MB Download
md5:f7f0230b5e08162710ef9f0ce601bf59
385.3 MB Download
md5:468237f8dcbd01098ed517c87eb66dab
569.1 MB Download
md5:e34ef8ded330914df1930ae3d599ce26
491.4 MB Download
md5:2ba1bde3d45d2c9dbd92d765dafd0fce
490.6 MB Download
md5:998576ec7b3b332513ba24ec844c4ff6
412.9 MB Download
md5:d02456b90255b9c26ba7871064c89467
568.5 MB Download
md5:6d4e5b0483d14ca557203e13f315385e
490.0 MB Download
md5:e83d9b9cd09fb55065c0cfa05a4912ef
569.1 MB Download
md5:1721a5849f295cb0ef1ca79a9a568328
491.4 MB Download
md5:db9347faa06ac0b9e78ce6b2c34e9833
489.7 MB Download
md5:d54518562f63aacf9d730355e249856c
490.7 MB Download
md5:3b3899fbebb67ad61660c0490b9c9941
496.7 MB Download
md5:4569c83a209fdac79283b3dbf4adc934
491.6 MB Download
md5:7ed1637555d1355310594bef5f5520a8
437.0 MB Download
md5:a7c5713b5c874c6609d7f6b1eda5f9f0
221.9 MB Download
md5:300046df9c16f57a4559209021b5f4a2
379.7 MB Download
md5:4e1b0d91b42825989d5b85b4a50f026e
490.7 MB Download
md5:eec15e5c0c3600e2ab513133eb8400ce
568.9 MB Download
md5:73421fa70dc48b882e2ecd253b9edfb9
491.7 MB Download
md5:057cd9f144c6175947ed851ba414f16d
490.6 MB Download
md5:87d2f120e017af9ec49902aabb9c38dd
573.5 MB Download
md5:77a5683ed02421aaf328795c854da88c
568.2 MB Download
md5:ebf06a2ca2a44d375cdf9b0a0beb9e99
490.6 MB Download
md5:9772a0e799c27d38c281fab32c873423
568.8 MB Download
md5:0929f3cdc1663deedf0c316969020b24
568.5 MB Download
md5:ece8a4683fbef2f6bf05674a713ba0c4
573.3 MB Download
md5:2461b6de6a449490c6911524c4290234
497.7 MB Download
md5:93c14d481fa4557c4d373d24d587cd66
378.1 MB Download
md5:17646179732a68064d5de59fd8f41ec7
552.0 MB Download
md5:e921e9442824ed698564c50f5a08a92c
501.6 MB Download
md5:77419663060121692b7abe9af3bbac05
373.1 MB Download
md5:6cbb7237d16031ee90afe1f8b0330775
490.8 MB Download
md5:323258205187bf992437256fb68609cd
492.9 MB Download
md5:1a11f95aeebc4f2c56f6b8d85d4c452c
491.4 MB Download
md5:0ae043f02e0aef108816b2e4fe7f95ab
310.7 MB Download
md5:23f14a0712f39c6f282e58b26b2b406f
570.8 MB Download
md5:43b62e3240438b8ff36c9efba614a5dc
568.9 MB Download
md5:2697c3d57c87e1bba814bf0b182d37f5
490.3 MB Download
md5:8f2528d4724189752cb7fabee1957259
569.1 MB Download
md5:987258841851f9151f485c505a89c728
525.8 MB Download
md5:1b88eeea164b2c2d99fba7589a426425
436.9 MB Download
md5:62006f469ced86809c9b5fdf36b49297
491.4 MB Download
md5:9ab26a11b239da7d8519919d0a783e26
570.5 MB Download
md5:9b142c1cf489353393446029b3b47ca2
569.0 MB Download
md5:8849a4b0426a801b25d69d395a307848
568.6 MB Download
md5:fe077c0a8f7d0f894c11f12ccf3b5ef1
568.6 MB Download
md5:dad5c02ec6197eda94b2c9b3e37b8a31
568.5 MB Download
md5:56973594c0503eeb31daa227050b74ca
403.6 MB Download
md5:625815e6075214dc814060137edc0de4
492.7 MB Download
md5:bc08d885d86ec923114bbb9419f0bf6b
491.3 MB Download
md5:c0ce5ef968c5ae294a6be059e87f2190
491.6 MB Download
md5:9062ca506dd3ce352d4f3d123eec8ce9
492.2 MB Download
md5:5a00295a31e6929ff16906ff327bcdfe
401.0 MB Download
md5:2e761b1b95791f517f7444e1de0dfd9b
568.6 MB Download
md5:a14d206ea725145015e8b3f8bd7685aa
568.7 MB Download
md5:c7b6a2a02eff8348098ed8186b325210
494.2 MB Download
md5:59e8cc4b8d63ad2d70a127856bc203e6
569.1 MB Download
md5:7809970e9faa672e9fb11e207a9836b0
490.7 MB Download
md5:e087f2afd7576f3c727c4fc3e6fb2153
568.5 MB Download
md5:b4bb2d93e6b507a57c337bdfd9ef789d
571.6 MB Download
md5:21ecca29da3b2f562f1db6388ae7b69d
505.9 MB Download
md5:9bb840c2204f45a4a9add2ce1ff49a44
490.7 MB Download
md5:30b476614dfd88a635ab7fb206ea1084
490.4 MB Download
md5:22726236ceaf7bcb47b88fc465612308
491.2 MB Download
md5:cad5634d4c7b7efb020499d20473d6f4
491.5 MB Download
md5:6759433fe711ef2d8345e5614eb5c98f
492.0 MB Download
md5:d85004e85c4c0b03064c519f3508e49e
507.4 MB Download
md5:8be1a1d90ce44c7158bbcd71b05532b0
491.3 MB Download
md5:05aa3dce4d6bf418bdead2fc9dc50e7d
570.4 MB Download
md5:b59707dffe88102d985df331887ed4e5
492.0 MB Download
md5:6ca31acd691112b4615d885484af12f3
491.3 MB Download
md5:afe271d5e467e73a8e793fd93dabe59b
5.7 kB Download

Additional details

Funding

UK Research and Innovation
Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions - GEMINI MR/V005359/1