This dataset contains the whole genome sequencing variants detected in prostate cancer samples used in the first tranche of papers from the Pan Prostate Cancer Group. Further details on this data release will be available in the data release paper.
Pan Prostate Cancer Group
The Pan Prostate Cancer Group (PPCG) is a multidisciplinary international consortium that has coordinated the harmonised analysis of DNA Whole Genome Sequencing (WGS), RNAseq, and DNA methylation data from a combined total of 2,021 prostate cancer donors including samples from metastatic and early onset disease, collected from men across a range of ancestries.
The aim of the PPCG is to target key scientific and clinical problems for men diagnosed with prostate cancer, including understanding aetiology, the early detection of aggressive disease, and the development of improved treatments.
The PPCG includes molecular biologists, mathematicians, bioinformaticians, genetic anthropologists, histopathologists, clinical trial experts, oncologists, and surgeons.
Sequencing data from participating study groups (UK, Germany, Denmark, France, Australia, USA, Canada, and Southern Africa) has been analysed using a range of analytical pipelines, providing a rich data resource.
Terms of use
Acknowledgement of Source
The Data User agrees to explicitly acknowledge the PanProstate Cancer Group in any oral presentation, abstract, peer-reviewed manuscript, or preprint that utilises this dataset.
Standard Citation Format
Any publication utilising this data must include the following standard citation text in the "Acknowledgements" or "Methods" section:
Additionally, the Data User shall cite the primary descriptor publication: [Insert Data Release Citation].
Data Versioning Disclosure
Because variant annotations and other data are updated dynamically, the Data User must specify the exact data release version and access date within their publication's supplementary methods (e.g., "PPCG Data Release vX, accessed October 2026").
Research Use Only Limitation
The genomic variant data provided under this Agreement is intended strictly for research purposes. The data has not been cleared, approved, or certified by the FDA, EMA, or any other regulatory body for clinical diagnostics or therapeutic decision-making.
Alignment with Original Patient Consent
The Data User acknowledges that the genomic data was gathered under specific Institutional Review Board (IRB) or Independent Ethics Committee (IEC) approved protocols and informed consent forms signed by the clinical participants. The Data User agrees to restrict their data utilisation strictly within the bounds. SEE DATA RELEASE PUBLICATION FOR MORE DETAILS.
Local Ethics Approval Requirements
The Data User certifies that their specific research protocol utilising this dataset has been reviewed and approved (or granted an explicit waiver) by their own institution’s IRB, IEC, or equivalent human subjects protection committee. Documentation of this local approval must be maintained by the Data User and provided to the Data Provider immediately upon request.
Compliance with International Ethical Frameworks
The Data User agrees to conduct all research utilizing this dataset in strict accordance with recognised international ethical standards, including but not limited to the Declaration of Helsinki, the CIOMS International Ethical Guidelines, and the local data privacy regulations governing genetic data (e.g., GDPR, HIPAA, or equivalent national legislation).
Non-Re-identification
The Data User explicitly agrees not to attempt to identify, contact, or re-identify any individual tissue donor or participant from whom the cancer variant data was derived or re-identify any individual tissue donor or participant from whom the cancer variant data was derived, or any of their blood (consanguineous) relatives. This includes, but is not limited to, the cross-referencing of somatic or germline variant files (VCFs/BAMs) with public genealogical databases, voter registries, or external clinical data repositories.
Prohibition of Stigmatisation
The Data User shall not use the data to generate claims, algorithms, or publications that promote racial, ethnic, or geographic stigmatisation. Research findings must accurately differentiate between somatic mutations (acquired in tissue) and germline variants (inherited) to prevent unintended demographic discrimination.