Clinical Text Imbalance Benchmark—Results (336 configurations), v2.

H, H

doi:10.5281/zenodo.17009030

Published August 30, 2025 | Version v2

Dataset Open

Clinical Text Imbalance Benchmark—Results (336 configurations), v2.

H, H

This dataset contains per-configuration test metrics for a large-scale benchmark of clinical text classification under extreme class imbalance (49,035 French breast radiology reports; minority prevalence ≈0.33%). A factorial design varied two vectorisers (BoW, TF–IDF), 12 resampling methods plus a baseline, and 15 classifiers. The file ml_experiment_results.csv reports one row per executed configuration (n=336) with: Vectorizer, Sampler, Classifier, Accuracy, Balanced_Accuracy, ROC_AUC, PR_AUC, Precision_male, Recall_male, F1_male, Precision_female, Recall_female, F1_female, F1_macro, F1_weighted, TP, FP, TN, FN.

Files

ml_experiment_results 2gram.csv

Files (388.7 kB)

Name	Size	Download all
ml_experiment_results 2gram.csv md5:a2af396683c8a696cc9e02d03e98de85	93.0 kB	Preview Download
results_v2_336-configs.csv md5:a2af396683c8a696cc9e02d03e98de85	93.0 kB	Preview Download
supp_figures_v2_pdf.pdf md5:2c989ca26ed6d6257747fd53c938c403	202.7 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	63	33
Downloads	85	65
Data volume	10.6 MB	8.5 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 30, 2025
Modified: October 10, 2025

Clinical Text Imbalance Benchmark—Results (336 configurations), v2.

Authors/Creators

Description

Files

ml_experiment_results 2gram.csv

Files (388.7 kB)