Published July 2, 2024 | Version v1
Dataset Open

MedPix-2.0

  • 1. ROR icon University of Palermo

Contributors

Description

MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications.

 

Please cite our work as follows if you use MedPix 2.0 
```
@misc{siragusa2025medpix20comprehensivemultimodal,
      title={MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs}, 
      author={Irene Siragusa and Salvatore Contino and Massimo La Ciura and Rosario Alicata and Roberto Pirrone},
      year={2025},
      eprint={2407.02994},
      archivePrefix={arXiv},
      primaryClass={cs.DB},
      url={https://arxiv.org/abs/2407.02994}, 
}
```

 

Below a description of Case_topic.json and Descriptions.json is provided. images folder contains all the images of the dataset, while in splitted_dataset folder, a split of the dataset is provided, please refer to /splitted_dataset/README.md for further informations.

Case_topic.json

Contains a list of JSON, each of these provide the information of a single clinical case. The structure of each element is reported below:

  • U_id the UID code idenifies a clinical case

  • TAC list of names of the .png files containing the CT scans (if present). Images are under the image folder.

  • MRI list of names of the .png files containing the MR scans (if present). Images are under the image folder.

  • Case dictionary with the information of the clinical case. It contains the following information:

    • Title the diagnosis
    • History patient's history
    • Exam
    • Findings
    • Differential Diagnosis
    • Case Diagnosis
    • Diagnosis By
  • Topic Dictionary with the general information about the disease. It contains the following information:

    • Title the diagnosis
    • Disease Discussion
    • ACR Code
    • Category

    Descriptions.json

Contains a list of JSON, each of these provide the textual information about a single image, stored in the image folder. The structure of each element is reported below:

  • Type Can be CT or MR, identifies teh scanning modality of the image.
  • U_id The UID code of the clinical case the image belongs to.
  • image name of the image file
  • location fine-grained information about the body part location of the given image
  • location category macro-location of the body-part showen in the given image
  • Description Dictionary with the decriptive information of the image. It contains the following information:
    • ACR codes
    • Age age of the patient
    • Sex sex of the patient
    • Caption refers to the specific caption of the image
    • Figure part
    • Modality scanning modality of the image
    • Plane

Files

splitted_dataset.zip

Files (328.2 MB)

Name Size Download all
md5:3a76f27c243867dcf84e0357207c20d8
3.1 MB Preview Download
md5:1868b4914d35a9172d7104b575331c7e
1.3 MB Preview Download
md5:e2d1f2663ccd703ca64629faf5fe5a29
321.9 MB Preview Download
md5:f56e3d24050daa3d5b0454d6284529a4
1.9 MB Preview Download

Additional details

Identifiers

Dates

Submitted
2024-06-30

Software

Repository URL
https://github.com/CHILab1/MedPix-2.0.git
Development Status
Active