Published January 4, 2024 | Version v2
Dataset Open

Datasets used in the benchmarking study of MR methods

Description

We conducted a benchmarking analysis of 16 summary-level data-based MR methods for causal inference with five real-world genetic datasets, focusing on three key aspects: type I error control, the accuracy of causal effect estimates, replicability, and power.

The datasets used in the MR benchmarking study can be downloaded here:

  1. "dataset-GWASATLAS-negativecontrol.zip":  the GWASATLAS dataset for evaluation of type I error control in confounding scenario (a): Population stratification
  2. "dataset-NealeLab-negativecontrol.zip": the Neale Lab dataset for evaluation of type I error control in confounding scenario (a): Population stratification;
  3. "dataset-PanUKBB-negativecontrol.zip": the Pan UKBB dataset for evaluation of type I error control in confounding scenario (a): Population stratification;
  4. "dataset-Pleiotropy-negativecontrol": the dataset  used for evaluation of type I error control in confounding scenario (b): Pleiotropy;
  5. "dataset-familylevelconf-negativecontrol.zip": the dataset used for evaluation of type I error control in confounding scenario (c): Family-level confounders;
  6. "dataset_ukb-ukb.zip": the dataset used for evaluation of the accuracy of causal effect estimates;
  7. "dataset-LDL-CAD_clumped.zip": the dataset used for evaluation of replicability and power;

Each of the datasets contains the following files:

  1.  "Tested Trait pairs": the exposure-outcome trait pairs to be analyzed;
  2. "MRdat" refers to the summary statistics after performing IV selection (p-value < 5e-05) and PLINK LD clumping with a clumping window size of 1000kb and an r^2 threshold of 0.001.
  3. "bg_paras" are the estimated background parameters "Omega" and "C" which will be used for MR estimation in MR-APSS.

Note:

  1. The formatted dataset after quality control can be accessible at our GitHub website (https://github.com/YangLabHKUST/MRbenchmarking).
  2. The details on quality control of GWAS summary statistics, formatting GWASs, and LD clumping for IV selection can be found on the MR-APSS software tutorial on the MR-APSS  website (https://github.com/YangLabHKUST/MR-APSS).
  3. R code for running MR methods is also available at https://github.com/YangLabHKUST/MRbenchmarking.

Files

dataset-familylevelconf-negativecontrol.zip

Files (56.4 MB)

Name Size Download all
md5:9d685fc35228d227dcf25c3ea250f240
1.9 MB Preview Download
md5:7e9dbb36101995dc4a198870bc4e8811
27.2 MB Preview Download
md5:97304fc02abd1ec0e3a7b14a40663c70
125.8 kB Preview Download
md5:f592479ff87f671f58ffd8fd74eab4d5
18.9 MB Preview Download
md5:515bfbcb8e0ca1f1310b6b912abc1116
7.3 MB Preview Download
md5:29e414a0a5e010bf73f5b340975358f2
843.9 kB Preview Download
md5:312b6362a57c1755ff78c113e600326b
69.6 kB Preview Download

Additional details

Related works

Is new version of
Dataset: https://zenodo.org/records/10929572 (Other)

Dates

Available
2024-08-08

Software

Repository URL
https://github.com/YangLabHKUST/MRbenchmarking
Programming language
R

References

  • Xianghong Hu, Mingxuan Cai, Jiashun Xiao, Xiaomeng Wan, Zhiwei Wang, Hongyu Zhao, Can Yang, Benchmarking Mendelian randomization methods for causal inference using genome-wide association study summary statistics, The American Journal of Human Genetics, Volume 111, Issue 8, 1717 - 1735. [medrxiv link]: https://medrxiv.org/cgi/content/short/2024.01.03.24300765v1.