Dataset for Detection and Segmentation of the Radiographic Features of Pulmonary Edema
Description
Objectives: This comprehensive dataset is well suited for training, evaluating, and using machine learning models to detect, segment, and analyze radiological features associated with pulmonary edema in chest X-ray images.
Description: This dataset consists of a collection of chest X-rays extracted from the MIMIC database, carefully collected at the Beth Israel Deaconess Medical Center. In total, it comprises 1000 chest X-rays obtained from 741 patients with features suggestive of edema. These X-rays were carefully selected for manual annotation. The annotations are rich and detailed, covering specific radiological features commonly associated with pulmonary edema, including cephalization, Kerley lines, pleural effusions, bat wings, and infiltrates. The dataset includes a wide variety of radiological features, with a total of 4263 annotations (Table 1). Furthermore, each chest radiograph is thoughtfully assigned a severity category, categorizing it as "no edema", "vascular congestion", "interstitial edema", or "alveolar edema".
Annotation Method: The annotation process was meticulously performed by a highly qualified clinician with over 10 years of radiology experience, utilizing both frontal and lateral views for each chest X-ray study. Cephalization and Kerley lines were delineated using polylines, while other features were delineated using binary masks. This methodological approach was carefully chosen to provide a comprehensive data set that would ensure accuracy in subsequent analyses and label assignments.
Notably, all features are represented as bounding boxes, meticulously defined by their respective upper-left (x1; y1) and lower-right (x2; y2) corners. In addition, selected features are provided with masks encoded in base 64 format. To facilitate seamless decoding, we provide a conversion script called "mask_converter.py" that allows the transformation of encoded masks into a versatile numpy array format. This feature improves the usability of the dataset for precise analysis and deep learning applications.
Datasets:
- SLY dataset: The dataset contains chest X-ray images labeled by clinicians, including both stacked frontal and lateral images. We obtained this dataset by annotating it on the Supervisely platform, and it is stored in JSON and PNG formats.
- Source dataset: The dataset is a transformed version of the SLY dataset. In this dataset, all annotations are consolidated into a single spreadsheet, and only frontal view images are represented.
- Processed dataset: The dataset focuses exclusively on the lung area for analysis, as other areas surrounding the lung typically contain extraneous information that clinicians do not use in their decision-making process.
- COCO dataset: A collection of subsets prepared in the COCO format and suitable for training and testing. It includes subsets for each feature and for all features evaluated in this study.
Access to the Study: For more comprehensive information about this study, please visit our GitHub repository at https://github.com/ViacheslavDanilov/edema_quantification and our Zenodo model repository at https://zenodo.org/doi/10.5281/zenodo.8393565.
Table 1. Summary of annotated radiological features and severity labels
Radiological feature |
Number of objects |
Severity label |
Number of cases |
Cephalization |
1656 |
No edema |
21 |
Kerley line |
609 |
Vascular congestion |
74 |
Pleural effusion |
317 |
Interstitial edema |
51 |
Bat wing |
1604 |
Alveolar edema |
595 |
Infiltrate |
77 |
|
|
TOTAL |
4263 |
TOTAL |
741 |
Files
coco_dataset.zip
Files
(16.7 GB)
Name | Size | Download all |
---|---|---|
md5:ba207c224c60c8e31a82e2fb6547fb13
|
5.1 GB | Preview Download |
md5:e4cdb01892519a2b9bb5bfd54d65d049
|
9.2 MB | Preview Download |
md5:f916a955cac2089e4edb4f6a14c3adbc
|
1.5 kB | Download |
md5:95ddd2a4854b5945f6f019e9682aa0d4
|
1.9 GB | Preview Download |
md5:0aefc705c1f2153ecf4d9f32409af450
|
6.3 GB | Preview Download |
md5:1e42f667f18b939f6f8ba5a0768c509f
|
3.4 GB | Preview Download |
Additional details
Identifiers
Software
- Repository URL
- https://github.com/ViacheslavDanilov/edema_quantification
- Programming language
- Python
- Development Status
- Active