There is a newer version of the record available.

Published March 8, 2025 | Version 2.1
Dataset Open

MultiCaRe: An open-source clinical case dataset for medical image classification and multimodal AI applications

  • 1. ROR icon Universidad Nacional del Sur

Description

The dataset contains multi-modal data from over 70,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

More than 90,000 patients and 280,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). For instance, regarding image filess, 66K of them are CC BY, 32K are CC BY-NC-SA, 32K are CC BY-NC, and 20 of them are CC0.

Files

data_dictionary.csv

Files (2.9 GB)

Name Size Download all
md5:6147674303929e5acc9e8986e747ea34
45.0 MB Download
md5:a5f8921be1eadc0072795385e3b6180e
49.2 MB Preview Download
md5:e9800f71512a2cfc6cabc659ab5ba725
52.0 MB Download
md5:4574baaecf1ab2e5441353bdc8d09f51
158.7 MB Download
md5:2de670f0f631189192835ee17830c4e3
6.4 kB Preview Download
md5:59f484571767752f3546c1b575bfffda
19.3 MB Download
md5:39bdafd6e078d48231d72418530ee5a6
779.5 MB Preview Download
md5:a4fb23b0677937cbbe21fc88d1cc66f3
57.1 MB Preview Download
md5:1987c585d627330a22209ef86bf51c5a
310.0 MB Preview Download
md5:a5e9363ecd5e74eae0ff515cfa9bcc71
327.0 MB Preview Download
md5:0885722cb61926835d43a6927bdc273f
266.9 MB Preview Download
md5:161233a58744c46c328702374ae8827d
278.1 MB Preview Download
md5:55fd574b7324c77c14d5a6e80df0ecfa
226.9 MB Preview Download
md5:6290db3092376daf83e1ce9b6e5493a8
241.1 MB Preview Download
md5:42683f0550d8ae98e1a901b4bdc336cc
57.0 MB Preview Download

Additional details

Related works

Is published in
Data paper: 10.1016/j.dib.2023.110008 (DOI)

Software

Repository URL
https://github.com/mauro-nievoff/MultiCaRe_Dataset
Programming language
Python
Development Status
Active