LiverHccSeg: A Publicly Available Multiphasic MRI Dataset with Liver and HCC Tumor Segmentations and Inter-Rater Agreement Analysis
Authors/Creators
- 1. Charité Center for Diagnostic and Interventional Radiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
- 2. Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, Connecticut, United States of America
- 3. Department of Biomedical Engineering, Yale University, New Haven, Connecticut, United States of America
Description
Please cite our data paper published in "Data in Brief": https://www.sciencedirect.com/science/article/pii/S2352340923007473
Background
Liver cancer ranks as the third leading cause of cancer-related mortality worldwide [1] and alarmingly, both the incidence and mortality rates of liver cancer are increasing [2; 3]. Among the various types of primary liver cancer, hepatocellular carcinoma (HCC) stands out as the most prevalent, accounting for approximately 70-85% of liver cancer cases [4]. Leveraging the advantages of magnetic resonance (MR) imaging, HCC can be reliably detected and diagnosed without the requirement of an invasive biopsy [5]. MR imaging offers high tissue contrast, which can be further enhanced through contrast-enhanced multiphasic magnetic resonance imaging (mpMRI) techniques. This enables accurate identification and non-invasive diagnosis of HCC [6].
Objective
Precise segmentation of the liver plays a crucial role in volumetry assessment and serves as a vital pre-processing step for subsequent tumor detection algorithms [7]. However, accurate liver segmentation can be particularly challenging in patients with cancer-related tissue alterations and deformations in shape [8]. Accurate HCC tumor segmentation is essential for the extraction of quantitative imaging biomarkers such as radiomics and can be used for studies on treatment response assessment and prognosis evaluation and provides critical information about the tumor biology. In order to enhance the reproducibility of liver and tumor segmentation, automated methods utilizing image analysis techniques and machine learning have been developed. These methods have demonstrated promising results [7; 8]; however, most algorithms were tested only on small internal test sets and therefore do not guarantee generalizable and consistent performance on external data.
Publicly available datasets allow for fair and objective comparisons between different algorithms, techniques, or approaches. Researchers can evaluate the strengths and weaknesses of their methods in relation to existing solutions and establish benchmarks for performance evaluation. In addition to providing a benchmark with this dataset, we also assess the inter-rater variability between two different sets of tumor segmentations. This analysis serves as a measure of reproducibility for human segmentations, highlighting the consistency or variability that may exist among different human raters. Understanding the reproducibility of human segmentations is essential in assessing the reliability of manual annotations and establishing a baseline for algorithm performance comparison. By introducing LiverHccSeg, we aim to fill the gap of lacking publicly available mpMRI HCC datasets and offer researchers and developers a valuable resource for algorithmic evaluation on external data and imaging biomarker analyzes.
Materials and Methods
Inclusion of Patients
All available scans from The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC) (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=6885436) were downloaded [9]. One multiphasic MRI scan (pre and triphasic post contrast) per patient was included. Patients who did not exhibit a tumor or residual tumor were excluded from the tumor segmentation dataset; however, they were included in the liver segmentation dataset.
MR Imaging Data
Subsequently, all imaging data was converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format with the dcm2nii (v2.1.53) package [10] and available header information was extracted using the pydicom (v.2.1.2) package [11]. Multiparametric MR sequences were labeled with a consistent syntax ('pre', 'art', 'pv', 'del', for the pre-contrast, arterial, portal-venous and delayed contrast phases, respectively). All images were already de-identified by the TCIA website. Images were acquired between the years 1993 and 2007 on Philips and Siemens scanners with field strengths of 1.5 and 3 Tesla. Full details of the imaging parameters can be found in Table 5. Briefly, the median repetition time (TR) and median echo time (TE) were 365.8 ms and 26.4 ms, respectively. The median slice thickness was 9.5 mm, the median bandwidth 536.9 Hz.
Scientific Reading
After conversion, all images were read in a scientific reading by two board-certified abdominal radiologists (S.A. and S.H with 9 and 10 years of experience, respectively). Any disagreement between the two raters was discussed in a consensus meeting. All HCC lesions were classified according to LI-RADS criteria [6].
Image Registration
The co-registration of pre-contrast, portal-venous, and delayed-phase images with arterial phase images was performed using the software BioImage Suite (v3.5) [12]. A non-rigid intensity-based registration approach was applied, employing a parameterized free-form deformation (FFD) with 3D B-splines [13]. The optimal FFD transformation was estimated by maximizing the normalized mutual information similarity metric [14] through gradient descent optimization. To enhance the optimization process, a multi-resolution image pyramid with three levels was utilized. The final B-spline control point spacing was set to 80 mm. The estimated transformation was then employed to warp the moving images (pre-contrast, portal-venous, and delayed-phase) into the reference image space, specifically the arterial phase image.
Liver and Tumor Segmentation and Statistical Analysis
All livers and tumors were manually segmented under the supervision of two board-certified abdominal radiologists using the software 3D Slicer (v4.10.2) [15]. To compare the segmentation agreement between the two sets of liver and tumor segmentations, we calculated segmentation metrics using the Python package seg-metrics (v1.0.0) [16]. All segmentation metrics and statistics were calculated in Python (v3.7).
Data description
The data that appears in this article include:
- dicoms.zip: This zip file contains all the raw MR images from The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC) [1] in the Digital Imaging and Communications in Medicine (DICOM) format used for the curation of this dataset. The data is structured as Patient-ID/DATE/SEQUENCE where Patient-ID is the unique unidentified patient ID, DATE is the date of the image acquisition, and SEQUENCE is the name of the MR sequence.
- LiverHccSeg_MetaData.xlsx: This spreadsheet contains all the metadata from the DICOM headers along with the data from the scientific image readings.
- nifti_and_segms.zip: This zip file contains all MR images along with the liver and tumor segmentations in the Neuroimaging Informatics Technology Initiative (NIfTI) format.
The data is structured as Patient-ID/DATE/SEQUENCE where Patient-ID is the unique anonymized patient identifier, DATE is the date of the image acquisition, and SEQUENCE is the name of the MRI sequence or segmentation image.
The NIfTI files are named as follows:
pre.nii.gz : Pre-contrast T1-weighted MRI
art.nii.gz: Arterial-phase T1-weighted MRI
pv.nii.gz: Portal-venous-phase T1-weighted MRI
del.nii.gz: Delayed-phase T1-weighted MRI
art_pre.nii.gz: Pre-contrast T1-weighted MRI registered to the corresponding arterial-phase T1-weighted image
art_pv.nii.gz: Portal-venous-phase T1-weighted MRI registered to the corresponding arterial-phase T1-weighted MRI
art_del.nii.gz: Delayed-phase T1-weighted MRI registered to the corresponding arterial-phase T1-weighted MRI
The corresponding manual segmentations are named after the rater and the type of segmentation and follow the format 'RATER_ROI.nii.gz' where RATER denotes the human rater and ROI denotes the region of interest that was segmented, for example, 'rater1_liver.nii.gz', 'rater2_liver.nii.gz', 'rater1_tumor1.nii.gz', and 'rater2_tumor1.nii.gz'. For tumor segmentations, an integer indicates the tumor identification number for different tumor ROIs, for example, 'rater1_tumor1.nii.gz' and 'rater2_tumor1.nii.gz'. The segmentations can be used for the arterial phase NIfTI file as well as the corresponding co-registered pre-contrast (art_pre.nii.gz), portal-venous (art_pv.nii.gz), and delayed-phase (art_del.nii.gz) images.
- segm_metrics.xlsx: This spreadsheet summarizes the segmentation agreement between the two sets of liver and tumor segmentations by the two board-certified abdominal radiologists.
References
1 Sung H, Ferlay J, Siegel RL et al (2021) Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71:209-249
2 Siegel RL, Miller KD, Jemal A (2019) Cancer statistics, 2019. CA Cancer J Clin 69:7-34
3 White DL, Thrift AP, Kanwal F, Davila J, El-Serag HB (2017) Incidence of Hepatocellular Carcinoma in All 50 United States, From 2000 Through 2012. Gastroenterology 152:812-820.e815
4 Perz JF, Armstrong GL, Farrington LA, Hutin YJ, Bell BP (2006) The contributions of hepatitis B virus and hepatitis C virus infections to cirrhosis and primary liver cancer worldwide. J Hepatol 45:529-538
5 Hamer OW, Schlottmann K, Sirlin CB, Feuerbach S (2007) Technology insight: advances in liver imaging. Nat Clin Pract Gastroenterol Hepatol 4:215-228
6 Chernyak V, Fowler KJ, Kamaya A et al (2018) Liver Imaging Reporting and Data System (LI-RADS) Version 2018: Imaging of Hepatocellular Carcinoma in At-Risk Patients. Radiology 289:816-830
7 Bousabarah K, Letzen B, Tefera J et al (2020) Automated detection and delineation of hepatocellular carcinoma on multiphasic contrast-enhanced MRI using deep learning. Abdom Radiol. 10.1007/s00261-020-02604-5
8 Gross M, Spektor M, Jaffe A et al (2021) Improved performance and consistency of deep learning 3D liver segmentation with heterogeneous cancer stages in magnetic resonance imaging. PLoS One 16:e0260630
9 Erickson BJ, Kirk S, Lee Y et al (2016) Radiology Data from The Cancer Genome Atlas Liver Hepatocellular Carcinoma [TCGA-LIHC] collection. The Cancer Imaging Archive. 10.7937/K9/TCIA.2016.IMMQW8UQ
10 dcm2nii DICOM to NIfTI converter. https://github.com/rordenlab/dcm2niix Accessed: 2021-12-07.
11 Mason D, scaramallion;, rhaxton; et al (2020) pydicom/pydicom: pydicom 2.1.2, v2.1.2. Zenodo
12 X. Papademetris MJ, N. Rajeevan, H. Okuda, R.T. Constable, L.H Staib BioImage Suite: An integrated medical image analysis suite, Section of Bioimaging Sciences, Dept. of Diagnostic Radiology, Yale School of Medicine. http://www.bioimagesuite.org.
13 Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ (1999) Nonrigid Registration Using Free-Form Deformations: Application to Breast MR Images. IEEE Trans Med Imaging 18:712–721
14 Studholme C, Hill DL, Hawkes DJ (1999) An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition 32:71-86
15 Fedorov A., Beichel R., Kalpathy-Cramer J. et al (2012) 3D Slicer as an Image Computing Platform for the Quantitative Imaging Network. Magn Reson Imaging 30:1323-1341
16 Ordgod (2020) Ordgod/segmentation_metrics: seg-metrics, v1.0.0. Zenodo
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ChangeLog
Version 1.1: Fixed incorrect liver segmentation mask.