Published October 1, 2020 | Version v2
Other Open

MITAO: a tool for enabling scholars in the Humanities to use Topic Modelling in their studies (data and results of MITAO)

  • 1. University of Bologna

Description

This repository contains the data and the results obtained using MITAO (https://github.com/catarsi/mitao) for a Topic Modelling analysis of a collection of the abstracts (in English) of the articles published in “Umanistica Digitale” (https://umanisticadigitale.unibo.it/). The contents of this repository are with regard to the paper "MITAO: a tool for enabling scholars in the Humanities to use Topic Modelling in their studies" submitted to AIUCD 2021 (http://www.aiucd.it/convegno-aiucd-2021/). 

This repository contains the following files: 

  • "data/": this directory contains the abstracts of the articles published in “Umanistica Digitale” (to the date of submission of this work). 
    Note: all the files in this directory are under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) following the journal specifications. 
     
  • "mitao_workflow/"this directory contains all the MITAO workflows.
    • "corpus_mitao.json"used for building the dictionary and the corpus starting from the collection of abstracts.
    • "coherence_mitao.json": used for calculating the coherence score of several LDA topic models
    • "analysis_mitao.json": used for creating the LDA topic model, its tabular datasets, and web-based visualizations.
       
  • "result/"this directory contains all the results generated using MITAO.
    • "coherence.csv": a tabular dataset containing the coherence score of several LDA topic models (obtained from the "coherence_mitao.json" workflow)
    • "corpus(vectorized).json" and "dictionary.gdict": the topic modelling corpus and dictionary (obtained from the "corpus_mitao.json" workflow
    • "tables/"this directory contains the tabular results. Two are the files included: "doc_topics.csv" and "word_topics.csv", which respectively represent a list of all the documents of the corpus with their corresponding representativeness for each of the generated topics, and the 30 most probable terms for interpreting each topic.  
    • "visualizations/": this directory contains two web-based and dynamic visualizations. "ldavis.html" visualize the topic modelling results using LDAvis (https://github.com/cpsievert/LDAvis), while "t-0005_chart_(year)_view.html" uses the MTMvis we have specially developed in MITAO. 

Files

topic_modelling.zip

Files (104.5 kB)

Name Size Download all
md5:f4ea2d228767d71fcef25e53a0f8ded7
104.5 kB Preview Download