MultiCaRe: An open-source clinical case dataset for medical image classification and multimodal AI applications
Contributors
Data curator:
Description
The dataset contains multi-modal data from over 76,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 139,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.
More than 98,000 patients and 397,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.
Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset. The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). For instance, regarding image files, 72K of them are CC BY, 33K are CC BY-NC, 32K are CC BY-NC-SA, and 20 of them are CC0.
Files
captions_and_labels.csv
Files
(3.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:988b4070c25eb04175901f397f89989e
|
47.7 MB | Download |
|
md5:4e78919b2a052022022473802d6b5d80
|
53.6 MB | Preview Download |
|
md5:b2d8a946cc15d54ae2feb811fab5aaec
|
55.3 MB | Download |
|
md5:3d19b916147bc16fd3bdb556546c52a4
|
168.1 MB | Download |
|
md5:723cb7557a939fe878c60be79c10f7a5
|
6.3 kB | Preview Download |
|
md5:112b1cc572dfd9045faae1a302c40a81
|
20.4 MB | Download |
|
md5:b219bbf447d2df6ce8df003055b79b0b
|
917.6 MB | Preview Download |
|
md5:d9744261dfcda559d19fcb76e7219372
|
56.4 MB | Preview Download |
|
md5:a3494c94ab31292d3820f0ec278d616c
|
306.9 MB | Preview Download |
|
md5:9b39f1c7d52121a2673d4303b3f75f32
|
323.3 MB | Preview Download |
|
md5:b5906c65d5bc1eb66aae4e56498968f6
|
263.7 MB | Preview Download |
|
md5:9c0f3f46f3309fca52ffd04dd422402f
|
275.0 MB | Preview Download |
|
md5:6948625e8dccfd35d03aaab838c8f596
|
224.3 MB | Preview Download |
|
md5:6e0628f7259dd47e5edf2adac3cddd08
|
237.9 MB | Preview Download |
|
md5:1d5106766d923d6b9f307849ec0c19a9
|
56.3 MB | Preview Download |
Additional details
Related works
- Is published in
- Data paper: 10.3390/data10080123 (DOI)
Dates
- Updated
-
2025-07-06New cases and images
Software
- Repository URL
- https://github.com/mauro-nievoff/MultiCaRe_Dataset
- Programming language
- Python
- Development Status
- Active