Published October 1, 2020
| Version v2
Other
Open
MITAO: a tool for enabling scholars in the Humanities to use Topic Modelling in their studies (data and results of MITAO)
Description
This repository contains the data and the results obtained using MITAO (https://github.com/catarsi/mitao) for a Topic Modelling analysis of a collection of the abstracts (in English) of the articles published in “Umanistica Digitale” (https://umanisticadigitale.unibo.it/). The contents of this repository are with regard to the paper "MITAO: a tool for enabling scholars in the Humanities to use Topic Modelling in their studies" submitted to AIUCD 2021 (http://www.aiucd.it/convegno-aiucd-2021/).
This repository contains the following files:
- "data/": this directory contains the abstracts of the articles published in “Umanistica Digitale” (to the date of submission of this work).
Note: all the files in this directory are under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) following the journal specifications.
- "mitao_workflow/": this directory contains all the MITAO workflows.
- "corpus_mitao.json": used for building the dictionary and the corpus starting from the collection of abstracts.
- "coherence_mitao.json": used for calculating the coherence score of several LDA topic models
- "analysis_mitao.json": used for creating the LDA topic model, its tabular datasets, and web-based visualizations.
- "result/": this directory contains all the results generated using MITAO.
- "coherence.csv": a tabular dataset containing the coherence score of several LDA topic models (obtained from the "coherence_mitao.json" workflow)
- "corpus(vectorized).json" and "dictionary.gdict": the topic modelling corpus and dictionary (obtained from the "corpus_mitao.json" workflow)
- "tables/": this directory contains the tabular results. Two are the files included: "doc_topics.csv" and "word_topics.csv", which respectively represent a list of all the documents of the corpus with their corresponding representativeness for each of the generated topics, and the 30 most probable terms for interpreting each topic.
- "visualizations/": this directory contains two web-based and dynamic visualizations. "ldavis.html" visualize the topic modelling results using LDAvis (https://github.com/cpsievert/LDAvis), while "t-0005_chart_(year)_view.html" uses the MTMvis we have specially developed in MITAO.
Files
topic_modelling.zip
Files
(104.5 kB)
Name | Size | Download all |
---|---|---|
md5:f4ea2d228767d71fcef25e53a0f8ded7
|
104.5 kB | Preview Download |