Inputs and results of "A quantitative and qualitative citation analysis to retracted articles in the humanities domain"
Description
This repository contains the datasets and visualizations generated in our work: "A quantitative and qualitative citation analysis to retracted articles in the humanities domain".
Note: the data are all contained inside the data.zip file. You need to unzip the container to get access to all the files and directories listed below.
The data (citations) gathered accompanied by their annotated characteristics are stored in data/:
- cits.csv: a dataset containing all the entities (rows in the CSV) which have cited a retracted article in the humanities domain. Each citing entity (row) is accompanied by a set of features (columns) that characterizes it.
Note: this dataset is licensed under a Creative Commons public domain dedication (CC0). - content.csv: a dataset containing the abstracts and the in-text citation contexts of all the citing entities gathered.
Note: the data keep their original license (the one provided by their publisher). This dataset is provided in order to favor the reproducibility of the results obtained in our work. - excluded_hum_retractions.csv: a list of the 12 humanities retracted articles with a humanities affinity score < 2, therefore excluded from the analysis.
Topic modeling
We run a topic modeling analysis on the textual features gathered (i.e. abstracts and citation contexts). The results are stored inside the topic_model/ directory. The topic modeling has been done using MITAO, a tool for mashing up automatic text analysis tools and creating a completely customizable visual workflow [1]. The directory workflow/ contains the workflows used in MITAO. The topic modeling results for each textual feature are separated into two different folders, abstract/ for the abstracts, and cits_context/ for the in-text citation contexts. Both the directories contain the following directories/files:
-
datasets_and_views/: the datasets and visualizations generated using MITAO.
-
ldamodel_corpus_dict/: it contains the dictionary, the LDA topic model, and the tokenized and vectorized corpus.
- rawdata/: the textual collection, metadata, and stopwords used as input in the workflow of MITAO
References
[1] Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A User Friendly and Modular Software for Topic Modelling [JD]. PuntOorg International Journal, 5(2), 135–149. https://doi.org/10.19245/25.05.pij.5.2.3
Files
data.zip
Files
(5.4 MB)
Name | Size | Download all |
---|---|---|
md5:b6306ae67375741c403cf2b5a69bc473
|
5.4 MB | Preview Download |