Published September 27, 2023 | Version 1.1
Dataset Open

Dataset for Detection and Segmentation of the Radiographic Features of Pulmonary Edema

  • 1. Politecnico di Milano
  • 2. Universitat Pompeu Fabra

Description

Objectives: This comprehensive dataset is well suited for training, evaluating, and using machine learning models to detect, segment, and analyze radiological features associated with pulmonary edema in chest X-ray images.

Description: This dataset consists of a collection of chest X-rays extracted from the MIMIC database, carefully collected at the Beth Israel Deaconess Medical Center. In total, it comprises 1000 chest X-rays obtained from 741 patients with features suggestive of edema. These X-rays were carefully selected for manual annotation. The annotations are rich and detailed, covering specific radiological features commonly associated with pulmonary edema, including cephalization, Kerley lines, pleural effusions, bat wings, and infiltrates. The dataset includes a wide variety of radiological features, with a total of 4263 annotations (Table 1). Furthermore, each chest radiograph is thoughtfully assigned a severity category, categorizing it as "no edema", "vascular congestion", "interstitial edema", or "alveolar edema".

Annotation Method: The annotation process was meticulously performed by a highly qualified clinician with over 10 years of radiology experience, utilizing both frontal and lateral views for each chest X-ray study. Cephalization and Kerley lines were delineated using polylines, while other features were delineated using binary masks. This methodological approach was carefully chosen to provide a comprehensive data set that would ensure accuracy in subsequent analyses and label assignments. 

Notably, all features are represented as bounding boxes, meticulously defined by their respective upper-left (x1; y1) and lower-right (x2; y2) corners. In addition, selected features are provided with masks encoded in base 64 format. To facilitate seamless decoding, we provide a conversion script called "mask_converter.py" that allows the transformation of encoded masks into a versatile numpy array format. This feature improves the usability of the dataset for precise analysis and deep learning applications.

Datasets:

  1. SLY dataset: The dataset contains chest X-ray images labeled by clinicians, including both stacked frontal and lateral images. We obtained this dataset by annotating it on the Supervisely platform, and it is stored in JSON and PNG formats.
  2. Source dataset: The dataset is a transformed version of the SLY dataset. In this dataset, all annotations are consolidated into a single spreadsheet, and only frontal view images are represented.
  3. Processed dataset: The dataset focuses exclusively on the lung area for analysis, as other areas surrounding the lung typically contain extraneous information that clinicians do not use in their decision-making process.
  4. COCO dataset: A collection of subsets prepared in the COCO format and suitable for training and testing. It includes subsets for each feature and for all features evaluated in this study.

Access to the Study: For more comprehensive information about this study, please visit our GitHub repository at https://github.com/ViacheslavDanilov/edema_quantification and our Zenodo model repository at https://zenodo.org/doi/10.5281/zenodo.8393565.

 

Table 1. Summary of annotated radiological features and severity labels

Radiological feature

Number of objects

Severity label

Number of cases

Cephalization

1656

No edema

21

Kerley line

609

Vascular congestion

74

Pleural effusion

317

Interstitial edema

51

Bat wing

1604

Alveolar edema

595

Infiltrate

77

 

 

TOTAL

4263

TOTAL

741

 

Files

coco_dataset.zip

Files (16.7 GB)

Name Size Download all
md5:ba207c224c60c8e31a82e2fb6547fb13
5.1 GB Preview Download
md5:e4cdb01892519a2b9bb5bfd54d65d049
9.2 MB Preview Download
md5:f916a955cac2089e4edb4f6a14c3adbc
1.5 kB Download
md5:95ddd2a4854b5945f6f019e9682aa0d4
1.9 GB Preview Download
md5:0aefc705c1f2153ecf4d9f32409af450
6.3 GB Preview Download
md5:1e42f667f18b939f6f8ba5a0768c509f
3.4 GB Preview Download

Additional details

Software

Repository URL
https://github.com/ViacheslavDanilov/edema_quantification
Programming language
Python
Development Status
Active