Published June 28, 2020 | Version 1
Dataset Open

Combining genome-wide studies of breast, prostate, ovarian and endometrial cancers maps cross-cancer susceptibility loci and identifies new genetic associations

Description

Data set linked to the paper, "Combining genome-wide studies of breast, prostate, ovarian and endometrial cancers maps cross-cancer susceptibility loci and identifies new genetic associations".  Pre-print of the paper is here: https://doi.org/10.1101/2020.06.16.146803.

 

cross_cancer_sum_stats.txt.gz contains summary genome-wide association statistics for susceptibility to single cancers (breast (BR), prostate (PR), ovarian (OV), endometrial (EN), estrogen receptor (ER)-positive breast (POS), ER-negative breast (NEG), and high-grade serous ovarian (HGS) cancers) and from the cross-cancer meta-analysis (main [main] and subtype-focused [sub]). EA in the header refers to the effect allele, OA is the other allele, EAF is the effect allele frequency in the largest of the single cancer data sets (BR), IMPR2 is the imputation quality in the largest of the single cancer data sets (BR), SE is the standard error, PVAL is the P-value, RE2Cs1 is the  RE2C statistic mean effect part, RE2Cs2 is the RE2C statistic heterogeneity part, RE2Cp* is the RE2C* P-value.  More on RE2Cp* can be found here: http://software.buhmhan.com/RE2C/index.php?mid=contact&act=dispBoardWrite and in     https://academic.oup.com/bioinformatics/article/33/14/i379/3953957 SNP names in cross_cancer_sum_stats.txt.gz include the chromosome and build 37 position.

 

main_tetrachoric_corr_matrix.txt and subtype_tetrachoric_corr_matrix.txt provide the tetrachoric correlation matrices used in the main and subtype-focused meta-analyses.  These were also used to specify the cryptic.cor argument of the exh.abf function of MetABF.  More on MetABF can be found here: https://github.com/trochet/metabf and in https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.22202

 

prior_sigmas_for_metabf.txt contains the values used to specify the prior.sigma argument of the exh.abf function in MetABF.

 

The breast cancer data used are described in PMID 29059683 and can be downloaded from http://bcac.ccge.medschl.cam.ac.uk/bcacdata/oncoarray/oncoarray-and-combined-summary-result/gwas- summary-results-breast-cancer-risk-2017/ (this link also includes acknowledgements).  The prostate cancer data are described in PMID 29892016 and can be downloaded from: http://practical.icr.ac.uk/blog/?page_id=8164 (this link also includes acknowledgements).  The ovarian cancer data used are described in PMID 28346442 and can be downloaded from https://www.ebi.ac.uk/gwas/studies/GCST004415.  The endometrial cancer data are described in PMID 30093612 and can be downloaded from https://www.ebi.ac.uk/gwas/studies/GCST006464.  These links point to the same data that form the basis of the cross_cancer_sum_stats.txt.gz file.

 

The sample size and precision of the data presented should preclude identification of any individual study participant.  However, in downloading these data, you undertake not to attempt to identify individual study participant and not to re-post these data to a third-party website.  Please cite the PMIDs highlighted above along with the appropriate acknowledements if you use the cross_cancer_sum_stats.txt.gz file.

 

If you have any questions about this repository, please email Siddhartha Kar at siddhartha dot kar at bristol dot ac dot uk

Files

main_tetrachoric_corr_matrix.txt

Files (1.0 GB)

Name Size Download all
md5:5cc94b1b551cf1d4a5be618b0e690c07
1.0 GB Download
md5:98edd899e61fe11567903a8f53f1152b
88 Bytes Preview Download
md5:11ec275c6861e6f8ba4ff011e82394fb
58 Bytes Preview Download
md5:202e6232223d8e41e841dea64bd6f784
147 Bytes Preview Download