Published July 15, 2024 | Version v2
Dataset Open

WAW-TACE: A Hepatocellular Carcinoma Multiphase CT Dataset with Segmentations, Radiomics Features, and Clinical Data

Description

The WAW-TACE dataset contains multiphase abdominal CT images from N=233 treatment-naive patients with HCC treated with TACE in monotherapy, annotated with N=377 hand-crafted liver tumor masks, automated segmentations of multiple internal organs, extracted radiomics features, and corresponding extensive clinical data.

 

Abstract

Hepatocellular carcinoma (HCC) is the most prevalent primary liver tumor and requires a multidisciplinary management approach. Transarterial chemoembolization (TACE) emerges as a first-line therapeutic option for patients ineligible for surgical treatment. Managing patients and predicting outcomes remains challenging, necessitating the development of novel prediction models. Artificial intelligence (AI) shows promise as a prognostic tool, but deploying models demands a substantial amount of training data. Significant investments are directed towards providing more datasets specifically tailored for AI research. Here, we provide the WAW-TACE dataset, which features comprehensive data from N=233 treatment-naive HCC patients treated with TACE. This includes pre-TACE multiphase CT images annotated with N=377 hand-crafted HCC segmentations and unsupervised multi-organ masks. Furthermore, the collection includes correspondingly extracted radiomics features and extensive clinical data, including critical outcome measures (such as progression-free survival, overall survival and TACE technical success).

Technical info

The WAT-TACE dataset includes four major components, outlined in Figure 1 of the manuscript: 1) clinical data; 2) imaging; 3) segmentations; and 4) radiomics features. This section includes details and additional informations about the files uploaded to the dataset:

1. Clinical data (tabular form)

All clinical variables are uploaded in tabular format in a single sheet "clinical_data.”. Every patient is assigned a distinct ID number, denoted as the "patpri" variable, which is linked to their CT imaging, segmentations, and radiomics data. A separate sheet within the “supplementary_table_s1_definitions” file provides detailed descriptions of each variable.

2. CT imaging (nifti files)

All CT scans are stored in four separate ZIP files named "ct_scans". For each patient, CT scans are placed in individual folders, the names of which correspond to the patient's ID. Different exam phases are saved as distinct NIfTI files, labeled with the numbers 0, 1, 2, or 3 to represent the native, arterial, portal, and delayed CT phases, respectively. For example, arterial phase CT of patient “35” is named “35_1_scan” and stored in folder “35”. Information about the specific CT acquisition parameters and exam phases available in the dataset is detailed in Table 2 of the manuscript.

3. Segmentations (nifti and nrrd files)

3.1. Unsupervised organ masks

For each phase of the examination, unsupervised segmentations of internal organs were generated using the “Total Segmentator” deep learning model [1] for each CT phase separately and named in a manner similar to the corresponding examination phase. The masks for each patient are stored in separate folders named according to “patpri” identifiers and are stored in “organ_masks” file.

3.2. Hand-crafted tumor masks

Tumor masks are included in “tumor_masks” file and were hand-crafted by radiologists. Manual segmentations were conducted during the exam phase where tumors were most distinctly visible, while the tumoral segmentation files were named accordingly. Importantly, as each CT phase was acquired during breath hold in separate inspirations, the diaphragm and consequently internal organs may occasionally be at slightly different axial slices. Thus, we advise that the tumoral masks should ideally be intrapositioned to fit perfectly with the exact study phase. The specific phase used for tumor segmentation is indicated in the name of the tumor mask (0, 1, 2, or 3 for native, arterial, portal and delayed CT phases, respectively). In subjects with more than one tumoral locus present, the tumoral masks are stored as separate segmentation files and marked with consecutive numbers (0, 1, 2, …). For example, file “35_1_0_tumor_seg” includes HCC mask for patient 35, fits exam phase P1 (arterial) and refers to tumor #1). The file "ct_hcc_metadata" contains a list of patients' IDs, the CT phases available in the dataset, the names of the included CT files, and the number of tumors segmented in each CT phase.

4. Radiomics (tabular form)

The radiomics variables are summarized in tabular form and included in a separate file: “radiomics_data”. Importantly, all radiomics features were extracted from raw masks generated by Total Segmentator, without manual VOIs correction. Each record contains information about the patient's ID and the specific examination phase to which it relates. There are only entries for examination phases that were actually performed for the patients, specified in the manuscript. Each record has 3339 columns related to calculated radiomics variables. These include 32 variables (18 first-order statistics and 14 shape-based features) for each of the 102 analyzed volumes of interest and 75 additional variables based on gray levels calculated only for HCC volumes. The values are not normalized, and there are no imputations for missing data. 

Access priviliges: The WAW-TACE dataset is publicly available to the scientific community and can be accessed via Zenodo. Researchers interested in utilizing this dataset can use it for scientific research purposes only. Access to this dataset is not contingent upon the requirement for co-authorship or the establishment of a research collaboration.

References:

[1] Jakob Wasserthal et al. "TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images", Radiology:AI, https://doi.org/10.1148/ryai.230024.

Files

ct_hcc_metadata_v2.csv

Files (48.2 GB)

Name Size Download all
md5:fb7aa2803eae6d75745203602b6d385a
49.2 kB Download
md5:c23daef027541e77b3f1fb66c3d8d1f7
16.2 kB Preview Download
md5:f099086c3b2f8f3c4c1397a9487da91f
10.4 GB Preview Download
md5:011421e5ab94c40a86930fddfe869187
10.5 GB Preview Download
md5:c1ce003a5b1a3a1c8fb2be82adc6d419
12.8 GB Preview Download
md5:beffa045d5e81edc03b5c9fd9b27c687
13.5 GB Preview Download
md5:701218bd108621d81596132921da1ed6
1.0 GB Preview Download
md5:b65e8169d0f4fe1f4e112cdb058c8c5b
22.4 MB Download
md5:fd9fb1cd1c7279882ad7edf6563b1a8d
10.6 kB Download
md5:4a835d7a90aa952284d02e16baf4beff
2.3 MB Preview Download

Additional details

Identifiers

Funding

Integra WUM-PW 1W12/ INTEGRA.1.6/N/23
Warsaw University of Technology
Integra WUM-PW 1W12/ INTEGRA.1.6/N/23
Medical University of Warsaw

References