Published May 29, 2026 | Version 1.0
Dataset Restricted

PanProstate Cancer Group Variants Public Release 1

Authors/Creators

Description

This dataset contains the whole genome sequencing variants detected in prostate cancer samples used in the first tranche of papers from the Pan Prostate Cancer Group. Further details on this data release will be available in the data release paper.

Pan Prostate Cancer Group

The Pan Prostate Cancer Group (PPCG) is a multidisciplinary international consortium that has coordinated the harmonised analysis of DNA Whole Genome Sequencing (WGS), RNAseq, and DNA methylation data from a combined total of 2,021 prostate cancer donors including samples from metastatic and early onset disease, collected from men across a range of ancestries. 


The aim of the PPCG is to target key scientific and clinical problems for men diagnosed with prostate cancer, including understanding aetiology, the early detection of aggressive disease, and the development of improved treatments. 
The PPCG includes molecular biologists, mathematicians, bioinformaticians, genetic anthropologists, histopathologists, clinical trial experts, oncologists, and surgeons. 


Sequencing data from participating study groups (UK, Germany, Denmark, France, Australia, USA, Canada, and Southern Africa) has been analysed using a range of analytical pipelines, providing a rich data resource. 

Terms of use

Acknowledgement of Source

The Data User agrees to explicitly acknowledge the PanProstate Cancer Group in any oral presentation, abstract, peer-reviewed manuscript, or preprint that utilises this dataset. 
 

Standard Citation Format 

Any publication utilising this data must include the following standard citation text in the "Acknowledgements" or "Methods" section: 
> "Some of the data used in this study was provided by the PanProstate Cancer Group (http://panprostate.org/)." 
 
Additionally, the Data User shall cite the primary descriptor publication: [Insert Data Release Citation]. 
 

Data Versioning Disclosure 

Because variant annotations and other data are updated dynamically, the Data User must specify the exact data release version and access date within their publication's supplementary methods (e.g., "PPCG Data Release vX, accessed October 2026").
 

Research Use Only Limitation

The genomic variant data provided under this Agreement is intended strictly for research purposes. The data has not been cleared, approved, or certified by the FDA, EMA, or any other regulatory body for clinical diagnostics or therapeutic decision-making.
 

Alignment with Original Patient Consent

The Data User acknowledges that the genomic data was gathered under specific Institutional Review Board (IRB) or Independent Ethics Committee (IEC) approved protocols and informed consent forms signed by the clinical participants. The Data User agrees to restrict their data utilisation strictly within the bounds. SEE DATA RELEASE PUBLICATION FOR MORE DETAILS.
 

Local Ethics Approval Requirements

The Data User certifies that their specific research protocol utilising this dataset has been reviewed and approved (or granted an explicit waiver) by their own institution’s IRB, IEC, or equivalent human subjects protection committee. Documentation of this local approval must be maintained by the Data User and provided to the Data Provider immediately upon request.
 

Compliance with International Ethical Frameworks

The Data User agrees to conduct all research utilizing this dataset in strict accordance with recognised international ethical standards, including but not limited to the Declaration of Helsinki, the CIOMS International Ethical Guidelines, and the local data privacy regulations governing genetic data (e.g., GDPR, HIPAA, or equivalent national legislation).
 

Non-Re-identification 

The Data User explicitly agrees not to attempt to identify, contact, or re-identify any individual tissue donor or participant from whom the cancer variant data was derived or re-identify any individual tissue donor or participant from whom the cancer variant data was derived, or any of their blood (consanguineous) relatives. This includes, but is not limited to, the cross-referencing of somatic or germline variant files (VCFs/BAMs) with public genealogical databases, voter registries, or external clinical data repositories.

Prohibition of Stigmatisation 

The Data User shall not use the data to generate claims, algorithms, or publications that promote racial, ethnic, or geographic stigmatisation. Research findings must accurately differentiate between somatic mutations (acquired in tissue) and germline variants (inherited) to prevent unintended demographic discrimination.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/20443138">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Terms of use

Acknowledgement of Source

The Data User agrees to explicitly acknowledge the PanProstate Cancer Group in any oral presentation, abstract, peer-reviewed manuscript, or preprint that utilises this dataset. 
 

Standard Citation Format 

Any publication utilising this data must include the following standard citation text in the "Acknowledgements" or "Methods" section: 
> "Some of the data used in this study was provided by the PanProstate Cancer Group (http://panprostate.org/)." 
 
Additionally, the Data User shall cite the primary descriptor publication: [Insert Data Release Citation]. 
 

Data Versioning Disclosure 

Because variant annotations and other data are updated dynamically, the Data User must specify the exact data release version and access date within their publication's supplementary methods (e.g., "PPCG Data Release vX, accessed October 2026").
 

Research Use Only Limitation

The genomic variant data provided under this Agreement is intended strictly for research purposes. The data has not been cleared, approved, or certified by the FDA, EMA, or any other regulatory body for clinical diagnostics or therapeutic decision-making.
 

Alignment with Original Patient Consent

The Data User acknowledges that the genomic data was gathered under specific Institutional Review Board (IRB) or Independent Ethics Committee (IEC) approved protocols and informed consent forms signed by the clinical participants. The Data User agrees to restrict their data utilisation strictly within the bounds. SEE DATA RELEASE PUBLICATION FOR MORE DETAILS.
 

Local Ethics Approval Requirements

The Data User certifies that their specific research protocol utilising this dataset has been reviewed and approved (or granted an explicit waiver) by their own institution’s IRB, IEC, or equivalent human subjects protection committee. Documentation of this local approval must be maintained by the Data User and provided to the Data Provider immediately upon request.
 

Compliance with International Ethical Frameworks

The Data User agrees to conduct all research utilizing this dataset in strict accordance with recognised international ethical standards, including but not limited to the Declaration of Helsinki, the CIOMS International Ethical Guidelines, and the local data privacy regulations governing genetic data (e.g., GDPR, HIPAA, or equivalent national legislation).
 

Non-Re-identification 

The Data User explicitly agrees not to attempt to identify, contact, or re-identify any individual tissue donor or participant from whom the cancer variant data was derived or re-identify any individual tissue donor or participant from whom the cancer variant data was derived, or any of their blood (consanguineous) relatives. This includes, but is not limited to, the cross-referencing of somatic or germline variant files (VCFs/BAMs) with public genealogical databases, voter registries, or external clinical data repositories.
 

Prohibition of Stigmatisation 

The Data User shall not use the data to generate claims, algorithms, or publications that promote racial, ethnic, or geographic stigmatisation. Research findings must accurately differentiate between somatic mutations (acquired in tissue) and germline variants (inherited) to prevent unintended demographic discrimination.

You are currently not logged in. Do you have an account? Log in here