Human-Annotated Topic Model Evaluation and Topic Resources for Comparative Analysis

Couto Pintos, Manuel; Losada, David E.; Parapar, Javier

doi:10.5281/zenodo.15081947

Published March 25, 2025 | Version v1

Dataset Open

Human-Annotated Topic Model Evaluation and Topic Resources for Comparative Analysis

1. CITIUS
2. Universidad de Santiago de Compostela

This dataset contains the topics extracted from various topic modeling techniques (LDA, BERTopic, and TopClust) along with human evaluations assessing their coherence. The evaluation process involved expert annotators providing qualitative assessments of topic interpretability and coherence. The dataset is structured to facilitate further research on topic modeling evaluation, including correlations between human judgments and automated coherence metrics.

It includes:

The extracted topics from different models.
Human-annotated coherence scores.
Aggregated comparisons between models.

Files

aggregated_hummans.csv

Files (1.0 MB)

Name	Size	Download all
aggregated_hummans.csv md5:ccaffe90cde6029c0cb7d46a50b032ed	982.5 kB	Preview Download
bertopic_results.csv md5:02fb8ddc5254443124be3cf9167bd2d6	9.4 kB	Preview Download
bertopic_topics.csv md5:ff89b2e9d823d0fef2a8268d6a89eb44	3.6 kB	Preview Download
lda_results.csv md5:d8195022b6cf44c8b8d7d4a7fdb40db3	9.1 kB	Preview Download
lda_topics.csv md5:66990a60ed37b1072ef8134b077c32e5	3.0 kB	Preview Download
topclus_results.csv md5:0c755e0480ac41a83bfc6f6d45d26979	10.1 kB	Preview Download
topclus_topics.csv md5:7ec5a6aec160245f1e67d28c95d53a05	4.2 kB	Preview Download

	All versions	This version
Views	5	5
Downloads	6	6
Data volume	9.8 MB	9.8 MB

Human-Annotated Topic Model Evaluation and Topic Resources for Comparative Analysis

Creators

Description

Files

aggregated_hummans.csv

Files (1.0 MB)