Published March 8, 2026 | Version 1.0.0
Dataset Open

GWAS summary statistics for Five UK Biobank phenotypes used in covImpute

Description

This repository contains genome-wide association study (GWAS) summary statistics used in the paper:

“Domain-Aware Matrix Completion for Phenotype Imputation Using Electronic Health Record Data with Applications in Genomic Research”
(Annals of Applied Statistics).

The repository includes GWAS summary statistics derived from the UK Biobank for five phenotypes:

  • severe depression (MDD)

  • breast cancer (BrCa)

  • prostate cancer (PrCa)

  • high blood pressure (HTN)

  • bowel cancer (CRC)

For each phenotype, GWAS summary statistics are provided for five analysis approaches:

  • COVV3C: covImpute

  • LTPI: LTPI

  • SOFT: softImpute

  • AUTO: autoComplete

  • GWAS: GWAS based on observed case-control status

In total, the repository contains 25 GWAS summary statistic files.

These summary statistics were used in the real-data analyses to compare the imputation performance of matrix completion methods and to evaluate their impact on downstream genomic analyses.

Each file contains SNP-level GWAS summary statistics with the following columns:

  • CHROM: chromosome (hg19)

  • GENPOS: genomic position (hg19)

  • ID: SNP identifier in the format CHR-GENPOS-ALLELE0-ALLELE1

  • ALLELE1: effect allele

  • ALLELE0: non-effect allele

  • BETA: effect size estimate

  • SE: standard error

  • LOG10P: p-value on the log10 scale

  • N: sample size

  • TEST: REGENIE output field indicating the analysis type

All variants are aligned to the hg19 genome build used in the analyses described in the paper.

The file names follow the convention:

<Phenotype>_<Method>.txt.gz

Examples include:

  • Severe_depression_COVV3C.txt.gz

  • Breast_cancer_LTPI.txt.gz

  • Prostate_cancer_SOFT.txt.gz

  • High_blood_pressure_AUTO.txt.gz

  • Bowel_cancer_GWAS.txt.gz

Files

Files (462.0 MB)

Name Size Download all
md5:8b01da3f63e8bf668b1ec77f236d2f7e
18.6 MB Download
md5:63330a24f5c83561c07bcdce057461c7
18.6 MB Download
md5:2b149a13987c4d6b22da45cc4301279b
18.5 MB Download
md5:fc6d6b424491d26090219f55b636b441
18.6 MB Download
md5:db5fbb754b328720ee094556dcdc12bd
18.6 MB Download
md5:4ff1ffc6bf7749290e7b374e7cbb1d8f
18.5 MB Download
md5:71a7ff6f28e03e43d18c990a15eaa2d5
18.5 MB Download
md5:a1b8fcc9cf3e5fb44b9744f2d71d1da2
18.2 MB Download
md5:5dd03159af9ef723c4bfd02ce4820c9c
18.4 MB Download
md5:2fc336dd1c00db1aa9df28dad93a9ddf
18.5 MB Download
md5:d5b8a46a30bd55d6b26c7cc0f5d58651
18.4 MB Download
md5:33277103d2623762b3f76eb8d0a9d70c
18.4 MB Download
md5:fa4211c671c3cded36c332b5a60b0e5c
18.3 MB Download
md5:9ec472a32d348d95c8fd4ae0a1957540
18.5 MB Download
md5:dac93eb4d5830541ee277270f0533442
18.4 MB Download
md5:e2af637a1923c3eb84718446764ade04
18.6 MB Download
md5:ca9216143ec5443613026552cf39d9b8
18.4 MB Download
md5:09dc3f1e9af46b635f653db5bc5cc120
18.3 MB Download
md5:87ab34b0c1ddc3107b162ac596b4ed53
18.4 MB Download
md5:df52556a8a99cdc59ec506f6c456b656
18.6 MB Download
md5:c70a5b32974f8b4b9b7bd05becd55011
18.5 MB Download
md5:2d6b16265245e4ad076e7b5fecc2cf26
18.6 MB Download
md5:0e38fff16efd66042ade7cde5e962954
18.4 MB Download
md5:60fe07a27dc992ab6666c28246464f90
18.6 MB Download
md5:65c26cc4d344cd96078300a5d19e00a2
18.5 MB Download

Additional details

Dates

Accepted
2026-03-08

Software

Repository URL
https://github.com/Slangevar/covImpute
Programming language
Python
Development Status
Active