TotalSegmentator segmentations and radiomics features for NCI Imaging Data Commons CT images
Description
This dataset contributes volumetric segmentations of the anatomic regions in a subset of CT images available from NCI Imaging Data Commons [1] (https://imaging.datacommons.cancer.gov/) automatically generated using the TotalSegmentation model v1.5.6 [2]. The initial release includes segmentations for the majority of the CT scans included in the National Lung Screening Trial (NLST) collection [3], [4] already available in IDC. Direct link to open this analysis result dataset in IDC (available after release of IDC v18): https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=TotalSegmentator-CT-Segmentations.
Specifically, for each of the CT series analyzed, we include segmentations as generated by TotalSegmentator, converted into DICOM Segmentation object format using dcmqi v1.3.0 [5], and first order and shape features for each of the segmented regions, as produced by pyradiomics v3.0.1 [6]. Radiomics features were converted to DICOM Structured Reporting documents following template TID1500 using dcmqi. TotalSegmentator analysis on the NLST cohort was executed using Terra platform [7]. Implementation of the workflow that was used for performing the analysis is available at https://github.com/ImagingDataCommons/CloudSegmentator [8].
Due to the large size of the files, they are stored in the cloud buckets maintained by IDC, and the attached files are the manifests that can be used to download the actual files.
If you use the files referenced in the attached manifests, we ask you to cite this dataset and the preprint describing how it was generated [9].
Download instructions
Each of the manifests include instructions in the header on how to download the included files.
To download the TotalSegmentator segmentations (in DICOM SEG format) and pyradiomics measurements (in DICOM SR format) files using .s5cmd
manifests:
- install idc-index package:
pip install --upgrade idc-index
- download the files referenced by manifests included in this dataset by passing the
.s5cmd
manifest file. E.g.,idc download totalsegmentator_ct_segmentations_aws.s5cmd
Other files included in the record are:
- firstorder and shape radiomics features extracted using pyradiomics, and organized one file per segmented structure (see README file in the zip file for details on how those are organized)
- pyradiomics_features_csv.zip: saved in CSV format
- pyradiomics_features_parquet.zip: saved in Parquet format
Support
If you have any questions about this dataset, or if you experience any issues, please reach out to Imaging Data Commons support via support@canceridc.dev or (preferred) IDC Forum at https://discourse.canceridc.dev.
Files
pyradiomics_features_csv.zip
Additional details
Related works
- Is derived from
- Dataset: 10.7937/TCIA.HMQ8-J677 (DOI)
- Is described by
- Other: 10.21203/rs.3.rs-4351526/v1 (DOI)
- Is published in
- Other: 10.25504/FAIRsharing.0b5a1d (DOI)
- Other: https://portal.imaging.datacommons.cancer.gov/ (URL)
References
- [1] A. Fedorov et al., "National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence," Radiographics, vol. 43, no. 12, Dec. 2023, doi: 10.1148/rg.230180.
- [2] J. Wasserthal et al., "TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images," Radiol. Artif. Intell., Jul. 2023, doi: 10.1148/ryai.230024.
- [3] National Lung Screening Trial Research Team et al., "The National Lung Screening Trial: overview and study design," Radiology, vol. 258, no. 1, pp. 243–253, Jan. 2011, doi: 10.1148/radiol.10091808.
- [4] National Lung Screening Trial Research Team, "Data from the National Lung Screening Trial (NLST) (Version 3) [dataset]." 2013. doi: 10.7937/TCIA.HMQ8-J677.
- [5] C. Herz et al., "dcmqi: An Open Source Library for Standardized Communication of Quantitative Image Analysis Results Using DICOM," Cancer Res., vol. 77, no. 21, pp. e87–e90, Nov. 2017, doi: 10.1158/0008-5472.CAN-17-0336.
- [6] J. J. M. van Griethuysen et al., "Computational Radiomics System to Decode the Radiographic Phenotype," Cancer Res., vol. 77, no. 21, pp. e104–e107, Nov. 2017, doi: 10.1158/0008-5472.CAN-17-0339.
- [7] C. Birger et al., "FireCloud, a scalable cloud-based platform for collaborative genome analysis: Strategies for reducing and controlling costs," bioRxiv, p. 209494, Nov. 03, 2017. doi: 10.1101/209494.
- [8] V. Thiriveedhi and A. Fedorov, ImagingDataCommons/CloudSegmentator: v1.2.0. Zenodo, 2024. doi: 10.5281/ZENODO.10712897.
- [9] Thiriveedhi, V. K., Krishnaswamy, D., Clunie, D., Pieper, S., Kikinis, R. & Fedorov, A. Cloud-based large-scale curation of medical imaging data using AI segmentation. Research Square (2024). doi:10.21203/rs.3.rs-4351526/v1