Published May 27, 2026 | Version 3.0.1
Dataset Open

MultiCaRe: An open-source clinical case dataset for medical image classification and multimodal AI applications

  • 1. ROR icon Universidad Nacional del Sur

Contributors

Description

The dataset contains multi-modal data from over 76,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 139,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

More than 98,000 patients and 397,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset. The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). For instance, regarding image files, 72K of them are CC BY, 33K are CC BY-NC, 32K are CC BY-NC-SA, and 20 of them are CC0.

Files

captions_and_labels.csv

Files (3.0 GB)

Name Size Download all
md5:988b4070c25eb04175901f397f89989e
47.7 MB Download
md5:4e78919b2a052022022473802d6b5d80
53.6 MB Preview Download
md5:b2d8a946cc15d54ae2feb811fab5aaec
55.3 MB Download
md5:3d19b916147bc16fd3bdb556546c52a4
168.1 MB Download
md5:723cb7557a939fe878c60be79c10f7a5
6.3 kB Preview Download
md5:112b1cc572dfd9045faae1a302c40a81
20.4 MB Download
md5:b219bbf447d2df6ce8df003055b79b0b
917.6 MB Preview Download
md5:d9744261dfcda559d19fcb76e7219372
56.4 MB Preview Download
md5:a3494c94ab31292d3820f0ec278d616c
306.9 MB Preview Download
md5:9b39f1c7d52121a2673d4303b3f75f32
323.3 MB Preview Download
md5:b5906c65d5bc1eb66aae4e56498968f6
263.7 MB Preview Download
md5:9c0f3f46f3309fca52ffd04dd422402f
275.0 MB Preview Download
md5:6948625e8dccfd35d03aaab838c8f596
224.3 MB Preview Download
md5:6e0628f7259dd47e5edf2adac3cddd08
237.9 MB Preview Download
md5:1d5106766d923d6b9f307849ec0c19a9
56.3 MB Preview Download

Additional details

Related works

Is published in
Data paper: 10.3390/data10080123 (DOI)

Dates

Updated
2025-07-06
New cases and images

Software

Repository URL
https://github.com/mauro-nievoff/MultiCaRe_Dataset
Programming language
Python
Development Status
Active