Published April 18, 2024 | Version v1
Other Open

CXR-LT 2024: Long-tailed, multi-label, and zero-shot classification on chest X-rays

  • 1. Weill Cornell Medicine
  • 2. The University of Texas at Austin
  • 3. Hong Kong University of Science and Technology
  • 4. Thomas Jefferson University
  • 5. MIT/Harvard
  • 6. FACMI, NIH
  • 7. NIH

Description

Chest radiography, like many diagnostic medical exams, produces a long-tailed distribution of clinical findings; while a small subset of diseases is routinely observed, the vast majority of diseases are relatively rare. This poses a challenge for standard deep learning methods, which exhibit bias toward the most common classes at the expense of the important but rare "tail" classes. Many existing methods have been proposed to tackle this specific type of imbalance, though only recently has attention been given to long-tailed medical image recognition problems. Diagnosis on chest X-rays (CXRs) is also a multi-label problem, as patients often present with multiple disease findings simultaneously; however, only a select few studies incorporate knowledge of label co-occurrence into the learning process. Since most large-scale image classification benchmarks contain single-label images with a mostly balanced distribution of labels, many standard deep learning methods fail to accommodate the class imbalance and co-occurrence problems posed by the long-tailed, multi-label nature of tasks like disease diagnosis on CXRs.
 
In the first iteration of CXR-LT held in 2023, we expanded upon the MIMIC-CXR dataset by enlarging the set of target classes from 14 to 26, generating labels for 12 new rare disease findings by parsing radiology reports. While this made for a challenging long-tailed, multi-label disease classification task that attracted 59 teams who contributed over 500 unique submissions, Radiology Gamuts Ontology documents over 4,500 unique radiological image findings. That is, the "true" distribution of all clinical findings on CXR is at least two orders of magnitude longer than what our -- or any existing -- dataset can offer. For this reason, we argue that the only way to truly tackle the long-tail of radiological image findings is to develop a model that can readily generalize to new classes in "zero-shot" fashion.
 
For this year's version of CXR-LT, we extract labels for an additional 19 rare disease findings (for a total of 377,110 CXR images, each with 45 disease labels) and introduce two new challenge tracks, featuring a zero-shot classification task. Our tasks include (i) long-tailed classification on a large, noisy test set, (ii) long-tailed classification in a small, manually annotated test set, and (iii) zero-shot generalization to previously unseen disease findings. For all tracks, participants will be provided with a large, automatically labeled training set of >250,000 CXR images with 40 binary disease labels. Task (i) will be evaluated on a large, automatically labeled test set of >75,000 CXRs from these same 40 labels; task (ii) will be evaluated on a "gold standard" subset of the test set, containing 409 CXRs from 26 of the 40 labels that were manually annotated by human readers; task (iii) will be evaluated on the same large test set of images as task (i), but for 5 "held-out" disease findings that have not been encountered during training. While last year's CXR-LT was a success, we hope that CXR-LT 2024 can provide even further meaningful methodological advances toward clinically realistic multi-label, long-tailed, and zero-shot disease classification on CXR.

Files

CXR-LT 2024_ Long-tailed, multi-label, and zero-sh.pdf

Files (174.2 kB)