Published November 2025 | Version v1
Dataset Open

NLST-Sybil: Expert annotations of tumor regions in the NLST CT images

  • 1. ROR icon Brigham and Women's Hospital
  • 2. Brigham and Women's Hospital Department of Radiology

Description

This dataset contains expert annotations of the suspicious lesions identified in the CT images of the NLST collection [1]. The annotations are stored as planar bounding boxes.

The annotations in this dataset were collected and shared by the authors as part of the activities described in [2]. The annotations were originally stored in JSON format, and shared via GitHub [3]. This dataset contains the annotations harmonized into DICOM representation. This dataset is available from the NCI Imaging Data Commons (IDC), and can be explored interactively in the IDC Portal using this link:  https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=NLST-Sybil

To create the original annotations, two fellowship-trained thoracic radiologists jointly annotated suspicious lesions on NLST LDCTs using MD.AI interface for all participants who developed cancer within 1 year after an LDCT. Each lesion’s volume was marked with bounding boxes on contiguous thin-cut axial images. The “ground truth” annotations were informed by the imaging appearance and the clinical data provided by the NLST, i.e., the series and image number of cancerous nodules and the anatomical location of biopsy-confirmed lung cancers. For these participants, lesions in the location of subsequently diagnosed cancers were also annotated, even if the precursor lesion lacked imaging features specific for cancer.

Specific files included in the record can be downloaded using the attached manifests. The suffix of the manifest indicates its content, which is the list of pointers to the public Google Cloud Storage (GCS) or Amazon Web Services (AWS) buckets containing the files included in the collection:

  1. -gcs.s5cmd: GCS-based manifest (to download the files described in the manifest, execute this command: pip install --upgrade idc-index && idc download manifest).

  2. -aws.s5cmd: AWS-based manifest (to download the files described in the manifest, execute this command: pip install --upgrade idc-index && idc download manifest).

  3. -dcf.dcf: Gen3-based manifest (see details in https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids). 

[1] National Lung Screening Trial Research Team. (2013). Data from the National Lung Screening Trial (NLST) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.HMQ8-J677

[2] Mikhael, P. G., Wohlwend, J., Yala, A., Karstens, L., Xiang, J., Takigami, A. K., Bourgouin, P. P., Chan, P., Mrah, S., Amayri, W., Juan, Y.-H., Yang, C.-T., Wan, Y.-L., Lin, G., Sequist, L. V., Fintelmann, F. J. & Barzilay, R. Sybil: A validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography. J. Clin. Oncol. JCO2201345 (2023). https://doi.org/10.1200/JCO.22.01345

[3] https://github.com/reginabarzilaygroup/Sybil

Files

Files (168.9 kB)

Name Size Download all
md5:b78c9bbd4ce0fd586e2bff60db29bcaf
62.4 kB Download
md5:f5f37ae70ac0a1557d0dd964f020a42f
44.1 kB Download
md5:b17a543ee82a06a852a97ff3a3bb5128
62.4 kB Download

Additional details

Related works

Is derived from
Dataset: 10.7937/TCIA.HMQ8-J677 (DOI)
Is described by
Publication: 10.1200/JCO.22.01345 (DOI)
Is published in
Publication: 10.25504/FAIRsharing.0b5a1d (DOI)