There is a newer version of the record available.

Published November 10, 2023 | Version 2.0
Dataset Open

ROCOv2: Radiology Objects in COntext Version 2, An Updated Multimodal Image Dataset

  • 1. Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
  • 2. Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany
  • 3. Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
  • 4. Microsoft, Redmond, Washington, USA
  • 5. University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
  • 6. University of Applied Sciences Western Switzerland (HES-SO), Switzerland

Description

Recent advances in deep learning techniques have enabled the development of systems for automatic analysis of medical images. These systems often require large amounts of training data with high quality labels, which is difficult and time consuming to generate.

Here, we introduce Radiology Object in COntext Version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PubMed Open Access subset. Concepts for clinical modality, anatomy (X-ray), and directionality (X-ray) were manually curated and additionally evaluated by a radiologist. Unlike MIMIC-CXR, ROCOv2 includes seven different clinical modalities.

It is an updated version of the ROCO dataset published in 2018, and includes 35,852 new images added to PubMed since 2018, as well as manually curated medical concepts for modality, body region (X-ray) and directionality (X-ray). The dataset consists of 80,080 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical 2023. The participants had access to the training and validation sets after signing a user agreement.

The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using the UMLS concepts provided with each image, e.g., to build systems to support structured medical reporting.

Additional possible use cases for the ROCOv2 dataset include the pre-training of models for the medical domain, and the evaluation evaluation of deep learning models for multi-task learning.

Please do not use this version of the dataset, use the most recent version instead!

Files

license_information.csv

Files (6.4 GB)

Name Size Download all
md5:b12e7e3627b599b18c68dd3091294ef6
62.6 kB Preview Download
md5:e5cfab59e9a4c254b477adecd83fb3e7
8.6 MB Preview Download
md5:a32753de44602bf84aa2f069d703bfc7
1.8 MB Preview Download
md5:305fae1d1ed17f6ce3883813ebee5b89
555.9 kB Preview Download
md5:c82e33dffc862354a962242abc7686a4
379.9 kB Preview Download
md5:0fcfa015bfcc67a84e132286deb7e095
861.2 MB Preview Download
md5:f38cf22d258ddf74cf88c341884ac75a
10.2 MB Preview Download
md5:4e56db97e17269e9c5fe3ac72ccdaa96
3.4 MB Preview Download
md5:f623d3026e7fa3f930d418a1f4ceda08
2.4 MB Preview Download
md5:3738a29aa8eb6529aef22c189cc853c3
4.6 GB Preview Download
md5:69d6c5de7c1d8151881ae479b27254e6
1.8 MB Preview Download
md5:db50674251bc1561036e55c945284562
561.5 kB Preview Download
md5:e6d1834c29afdd399db8220a9ea44229
387.8 kB Preview Download
md5:4d3970667d777fe52b9176a64c7898ee
863.6 MB Preview Download

Additional details

Related works

Is new version of
Conference paper: 10.1007/978-3-030-01364-6_20 (DOI)

Dates

Submitted
2023-11-10
Submitted to Scientific Data