Published February 9, 2025 | Version Version V1
Dataset Open

Dataset of Spanish Mammographic Reports with BI-RADS Classification

Description

This dataset contains a total of 4,357 reports of mammographic studies in Spanish, obtained through several medical units in Paraguay. This dataset aims to help with the shortage of public datasets within the area of natural language processing applied to radiological reports.

This dataset contains key information from the mammographic reports through the 15 variables that make up our dataset, the full text of the reports is included, but each of the sections of the report is also included separately, these sections are clinical observations, diagnostic conclusions and follow-up recommendations, in addition to the BI-RADS classification that has been assigned to each report, finally there are metadata related to the reports such as a unique identifier, year, month and patient information such as age, patient reasons for the analysis, last menstruation period, type of hormonal therapy received, family history and number of children

This dataset, containing data not generated artificially, represents a real-world scenario, which can be used by researchers to replicate results from articles within the area, as well as to develop and test new models and algorithms specifically for the classification of the BI-RADS system.

Files

BIRADS_radiology_reports.csv

Files (7.3 MB)

Name Size Download all
md5:5c86e1318d4cae3d260560f77a131e2b
7.3 MB Preview Download
md5:3b28ca7a2d75552e475889f41810092a
2.0 kB Preview Download

Additional details

Dates

Collected
2019-01
Start of the data collection
Collected
2024-08
End of the data collection