NLST-Sybil: Expert annotations of tumor regions in the NLST CT images
Authors/Creators
Description
This dataset contains expert annotations of the suspicious lesions identified in the CT images of the NLST collection [1]. The annotations are stored as planar bounding boxes.
The annotations in this dataset were collected and shared by the authors as part of the activities described in [2]. The annotations were originally stored in JSON format, and shared via GitHub [3]. This dataset contains the annotations harmonized into DICOM representation. This dataset is available from the NCI Imaging Data Commons (IDC), and can be explored interactively in the IDC Portal using this link: https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=NLST-Sybil
To create the original annotations, two fellowship-trained thoracic radiologists jointly annotated suspicious lesions on NLST LDCTs using MD.AI interface for all participants who developed cancer within 1 year after an LDCT. Each lesion’s volume was marked with bounding boxes on contiguous thin-cut axial images. The “ground truth” annotations were informed by the imaging appearance and the clinical data provided by the NLST, i.e., the series and image number of cancerous nodules and the anatomical location of biopsy-confirmed lung cancers. For these participants, lesions in the location of subsequently diagnosed cancers were also annotated, even if the precursor lesion lacked imaging features specific for cancer.
Specific files included in the record can be downloaded using the attached manifests. The suffix of the manifest indicates its content, which is the list of pointers to the public Google Cloud Storage (GCS) or Amazon Web Services (AWS) buckets containing the files included in the collection:
-
-gcs.s5cmd: GCS-based manifest (to download the files described in the manifest, execute this command: pip install --upgrade idc-index && idc download manifest). -
-aws.s5cmd: AWS-based manifest (to download the files described in the manifest, execute this command: pip install --upgrade idc-index && idc download manifest). -
-dcf.dcf: Gen3-based manifest (see details in https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids).
[1] National Lung Screening Trial Research Team. (2013). Data from the National Lung Screening Trial (NLST) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.HMQ8-J677
[2] Mikhael, P. G., Wohlwend, J., Yala, A., Karstens, L., Xiang, J., Takigami, A. K., Bourgouin, P. P., Chan, P., Mrah, S., Amayri, W., Juan, Y.-H., Yang, C.-T., Wan, Y.-L., Lin, G., Sequist, L. V., Fintelmann, F. J. & Barzilay, R. Sybil: A validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography. J. Clin. Oncol. JCO2201345 (2023). https://doi.org/10.1200/JCO.22.01345
Files
Files
(168.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b78c9bbd4ce0fd586e2bff60db29bcaf
|
62.4 kB | Download |
|
md5:f5f37ae70ac0a1557d0dd964f020a42f
|
44.1 kB | Download |
|
md5:b17a543ee82a06a852a97ff3a3bb5128
|
62.4 kB | Download |
Additional details
Related works
- Is derived from
- Dataset: 10.7937/TCIA.HMQ8-J677 (DOI)
- Is described by
- Publication: 10.1200/JCO.22.01345 (DOI)
- Is published in
- Publication: 10.25504/FAIRsharing.0b5a1d (DOI)