Published August 11, 2025 | Version v1
Dataset Open

Longitudinal Dataset of Indeterminate Lung Nodules from CT Scans

  • 1. ROR icon Universidad Politécnica de Madrid

Contributors

Data curator:

  • 1. ROR icon Universidad Politécnica de Madrid

Description

This dataset is derived from the National Lung Screening Trial (NLST) database, accessed under a data transfer agreement with the National Cancer Institute (Bethesda, MD, USA). The NLST includes a large repository of lung cancer screening data with imaging from over 50,000 high-risk individuals. Participants were current or former smokers aged 55 to 74 years, with a smoking history of at least 30 pack-years and no prior lung cancer.

The original NLST dataset contains radiologist annotations marking screening exams as positive if non-calcified nodules or masses ≥4 mm were detected or other suspicious abnormalities identified.

Our dataset focuses on indeterminate pulmonary nodules observed during longitudinal low-dose CT (LDCT) screening. We selected nodules that underwent follow-up with at least two LDCT scans during the 3-year screening period, excluding nodules with malignancy diagnosed beyond this period or nodules confirmed malignant at baseline.

The dataset contains nodules from 443 subjects (235 non-cancer and 208 cancer cases), with up to three annual LDCT scans per subject (baseline T0, follow-ups T1 and T2). Non-cancer nodules were confirmed benign by follow-up, while malignant nodules were biopsy-confirmed with anatomical locations annotated from NLST records. To maintain consistency, we used images reconstructed with soft kernel filters, which are standard for nodule detection and assessment.

A total of 703 nodules are included in this dataset, with longitudinal position data (x, y, z) and malignancy status, enabling research on temporal modeling and malignancy prediction.

Usage note:
Please cite the related publication when using this dataset:
Farina, B., Bermejo Peláez, D., Montalvo García, D., Carbajo Benito, R., Seijo Maceiras, L., & Ledesma Carbayo, M. J. (2025). Spatio-Temporal Deep Learning with Temporal Attention for Indeterminate Lung Nodule Classification. Computers in Biology and Medicine. 10.1016/j.compbiomed.2025.110813.

Files

nodule_location_globAttCRNN.csv

Files (189.7 kB)

Name Size Download all
md5:4aa7f52179cbe9c87e4b9dfcc2c8918b
189.7 kB Preview Download