Published March 25, 2025 | Version v1
Dataset Open

Human-Annotated Topic Model Evaluation and Topic Resources for Comparative Analysis

  • 1. CITIUS
  • 2. Universidad de Santiago de Compostela

Description

This dataset contains the topics extracted from various topic modeling techniques (LDA, BERTopic, and TopClust) along with human evaluations assessing their coherence. The evaluation process involved expert annotators providing qualitative assessments of topic interpretability and coherence. The dataset is structured to facilitate further research on topic modeling evaluation, including correlations between human judgments and automated coherence metrics.

It includes:

  • The extracted topics from different models.

  • Human-annotated coherence scores.

  • Aggregated comparisons between models.

Files

aggregated_hummans.csv

Files (1.0 MB)

Name Size Download all
md5:ccaffe90cde6029c0cb7d46a50b032ed
982.5 kB Preview Download
md5:02fb8ddc5254443124be3cf9167bd2d6
9.4 kB Preview Download
md5:ff89b2e9d823d0fef2a8268d6a89eb44
3.6 kB Preview Download
md5:d8195022b6cf44c8b8d7d4a7fdb40db3
9.1 kB Preview Download
md5:66990a60ed37b1072ef8134b077c32e5
3.0 kB Preview Download
md5:0c755e0480ac41a83bfc6f6d45d26979
10.1 kB Preview Download
md5:7ec5a6aec160245f1e67d28c95d53a05
4.2 kB Preview Download