Published March 25, 2025
| Version v1
Dataset
Open
Human-Annotated Topic Model Evaluation and Topic Resources for Comparative Analysis
Creators
- 1. CITIUS
- 2. Universidad de Santiago de Compostela
Description
This dataset contains the topics extracted from various topic modeling techniques (LDA, BERTopic, and TopClust) along with human evaluations assessing their coherence. The evaluation process involved expert annotators providing qualitative assessments of topic interpretability and coherence. The dataset is structured to facilitate further research on topic modeling evaluation, including correlations between human judgments and automated coherence metrics.
It includes:
-
The extracted topics from different models.
-
Human-annotated coherence scores.
-
Aggregated comparisons between models.
Files
aggregated_hummans.csv
Files
(1.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ccaffe90cde6029c0cb7d46a50b032ed
|
982.5 kB | Preview Download |
|
md5:02fb8ddc5254443124be3cf9167bd2d6
|
9.4 kB | Preview Download |
|
md5:ff89b2e9d823d0fef2a8268d6a89eb44
|
3.6 kB | Preview Download |
|
md5:d8195022b6cf44c8b8d7d4a7fdb40db3
|
9.1 kB | Preview Download |
|
md5:66990a60ed37b1072ef8134b077c32e5
|
3.0 kB | Preview Download |
|
md5:0c755e0480ac41a83bfc6f6d45d26979
|
10.1 kB | Preview Download |
|
md5:7ec5a6aec160245f1e67d28c95d53a05
|
4.2 kB | Preview Download |