There is a newer version of the record available.

Published April 25, 2024 | Version v1
Dataset Restricted

WAW-TACE: A Dataset of Hepatocellular Carcinoma Patients Treated with Transarterial Chemoembolization, Featuring Annotated Multiphase CT Images, Radiomics Features, and Comprehensive Clinical Data

Description

The WAW-TACE dataset contains multiphase abdominal CT images from N=233 treatment-naive patients with HCC treated with TACE in monotherapy, annotated with N=377 hand-crafted liver tumor masks, automated segmentations of multiple internal organs, extracted radiomics features, and corresponding extensive clinical data.

 

Abstract

Hepatocellular carcinoma (HCC) is the most prevalent primary liver tumor and requires a multidisciplinary management approach. Transarterial chemoembolization (TACE) emerges as a first-line therapeutic option for patients ineligible for surgical treatment. Managing patients and predicting outcomes remains challenging, necessitating the development of novel prediction models. Artificial intelligence (AI) shows promise as a prognostic tool, but deploying models demands a substantial amount of training data. Significant investments are directed towards providing more datasets specifically tailored for AI research. Here, we provide the WAW-TACE dataset, which features comprehensive data from N=233 treatment-naive HCC patients treated with TACE. This includes pre-TACE multiphase CT images annotated with N=377 hand-crafted HCC segmentations and unsupervised multi-organ masks. Furthermore, the collection includes correspondingly extracted radiomics features and extensive clinical data, including critical outcome measures (such as progression-free survival, overall survival and TACE technical success).

Technical info

The WAT-TACE dataset includes four major components, outlined in Figure 1 of the manuscript: 1) clinical data; 2) imaging; 3) segmentations; and 4) radiomics features. This section includes details and additional informations about the files uploaded to the dataset:

1. Clinical data (tabular form)

All clinical variables are uploaded in tabular format in a single sheet "clinical_data.”. Every patient is assigned a distinct ID number, denoted as the "patpri" variable, which is linked to their CT imaging, segmentations, and radiomics data. A separate sheet within the “supplementary_table_s1_definitions” file provides detailed descriptions of each variable.

2. CT imaging (nifti files)

All CT scans are stored in four separate ZIP files named "ct_scans". For each patient, CT scans are placed in individual folders, the names of which correspond to the patient's ID. Different exam phases are saved as distinct NIfTI files, labeled with the numbers 0, 1, 2, or 3 to represent the native, arterial, portal, and delayed CT phases, respectively. For example, arterial phase CT of patient “35” is named “35_1_scan” and stored in folder “35”. Information about the specific CT acquisition parameters and exam phases available in the dataset is detailed in Table 2 of the manuscript.

3. Segmentations (nifti and nrrd files)

3.1. Unsupervised organ masks

For each phase of the examination, unsupervised segmentations of internal organs were generated using the “Total Segmentator” deep learning model [1] for each CT phase separately and named in a manner similar to the corresponding examination phase. The masks for each patient are stored in separate folders named according to “patpri” identifiers and are stored in “organ_masks” file.

3.2. Hand-crafted tumor masks

Tumor masks are included in “tumor_masks” file and were hand-crafted by radiologists. Manual segmentations were conducted during the exam phase where tumors were most distinctly visible, while the tumoral segmentation files were named accordingly. Importantly, as each CT phase was acquired during breath hold in separate inspirations, the diaphragm and consequently internal organs may occasionally be at slightly different axial slices. Thus, we advise that the tumoral masks should ideally be intrapositioned to fit perfectly with the exact study phase. The specific phase used for tumor segmentation is indicated in the name of the tumor mask (0, 1, 2, or 3 for native, arterial, portal and delayed CT phases, respectively). In subjects with more than one tumoral locus present, the tumoral masks are stored as separate segmentation files and marked with consecutive numbers (0, 1, 2, …). For example, file “35_1_0_tumor_seg” includes HCC mask for patient 35, fits exam phase P1 (arterial) and refers to tumor #1). The file "ct_hcc_metadata" contains a list of patients' IDs, the CT phases available in the dataset, the names of the included CT files, and the number of tumors segmented in each CT phase.

4. Radiomics (tabular form)

The radiomics variables are summarized in tabular form and included in a separate file: “radiomics_data”. Importantly, all radiomics features were extracted from raw masks generated by Total Segmentator, without manual VOIs correction. Each record contains information about the patient's ID and the specific examination phase to which it relates. There are only entries for examination phases that were actually performed for the patients, specified in the manuscript. Each record has 3339 columns related to calculated radiomics variables. These include 32 variables (18 first-order statistics and 14 shape-based features) for each of the 102 analyzed volumes of interest and 75 additional variables based on gray levels calculated only for HCC volumes. The values are not normalized, and there are no imputations for missing data. 

Access priviliges: The WAW-TACE dataset is publicly available to the scientific community and can be accessed via Zenodo. Researchers interested in utilizing this dataset can use it for scientific research purposes only. Access to this dataset is not contingent upon the requirement for co-authorship or the establishment of a research collaboration.

References:

[1] Jakob Wasserthal et al. "TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images", Radiology:AI, https://doi.org/10.1148/ryai.230024.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

The files are avaliable on request for peer review in "Radiology:AI".

You are currently not logged in. Do you have an account? Log in here

Additional details

Funding

Warsaw University of Technology
Integra WUM-PW 1W12/ INTEGRA.1.6/N/23
Medical University of Warsaw
Integra WUM-PW 1W12/ INTEGRA.1.6/N/23

References