Published August 7, 2024 | Version v1
Dataset Restricted

BIMCV-Prostate-Dataset V1

  • 1. ROR icon Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana

Contributors

  • 1. ROR icon Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana

Description

⚠️ Data Availability Notice

 

The BIMCV Prostate Dataset is currently subject to a regulatory reassessment process conducted by the competent authorities and collaborating healthcare institutions in the Valencian Community (Spain). This process aims to ensure full compliance with updated legal and ethical frameworks governing the sharing of medical data.

Consequently, although the dataset is publicly registered and associated with a peer-reviewed publication, access to the data is temporarily suspended until the review process is completed and all requirements are fulfilled.

Access will be reinstated as soon as authorization is granted under the revised regulations. We remain committed to transparency and to enabling data access in accordance with applicable standards.

---
The BIMCV Prostate Dataset is a comprehensive and diverse dataset that includes a total of 9,341 prostate MRI sessions, distributed among 8,441 subjects, collected from 16 healthcare centers in the Valencian Community, Spain. This dataset is structured according to the MIDS (Medical Imaging Data Structure) standard, ensuring consistent and accessible organization for researchers, facilitating data use and analysis.

The first version of the dataset focuses on sessions that contain the three mentioned imaging modalities (T2W, DWI, and ADC), resulting in a total of 1,730 complete sessions, with a total of 4,663 samples for training, of which 2,594 are csPCa positive and 2,069 are csPCa negative. This information can be found in the table available on GitHub.

The dataset includes MRI images in three modalities: T2-weighted images (T2W), diffusion-weighted images (DWI), and apparent diffusion coefficient (ADC) maps. In total, the dataset includes 32,662 T2W images (62.97%), 8,036 DWI images (15.49%), and 11,167 ADC maps (21.53%), including both the original maps and those calculated from the available DWI images. This additional calculation process was carried out to ensure the dataset's integrity and consistency, allowing for comprehensive analysis in the field of prostate oncology.

The exploratory data analysis (EDA) performed on this dataset has provided insights into the characteristics and distribution of the images, ensuring the dataset's representativeness and diversity. For example, it was found that Health Center 5 contributed the highest proportion of sessions (15.6%), followed by Health Center 7 (12.3%) and Health Center 17 (10.5%). This level of diversity in data sources ensures that the dataset encompasses a wide range of imaging acquisition practices and patient demographics, improving the generalization of artificial intelligence models developed with this data.

Additionally, the analysis of the distribution by MRI equipment manufacturer revealed that most images were acquired with General Electric equipment (66.7%), followed by Philips (25.1%) and Siemens (8.13%). Similarly, most sessions were conducted with 1.5 Tesla machines (63%), followed by 3.0 Tesla machines (36.5%), reflecting standard clinical practices in the region.

Regarding the distribution of labels within the dataset, of the total cases, 4,871 (approximately 52%) are labeled as csPCa positive, while 3,514 cases (approximately 37%) are labeled as csPCa negative.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/13254318">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

To access the dataset, please fill out the following survey: https://forms.office.com/e/frV3A5dT6r

After that request in zenodo and we will check the information.

You are currently not logged in. Do you have an account? Log in here

Additional details

Funding

Ministerio de Asuntos Económicos y Transformación Digital
Red Federada de Inteligencia Artificial para acelerar la Investigación Sanitaria MIA.2021.M02.0005

Software

Repository URL
https://github.com/BIMCV-CSUSP/BIMCV-Prostate-Classification
Programming language
Python
Development Status
Active