Pooja
2020-11-18
<p>Replication Package for the tool "Makar: A Framework for Multi-source Studies based on Unstructured Data"</p>
<pre><code class="language-markdown"># RP-Makar-tool
Replication Package for the tool "Makar: A Framework for Multi-source Studies based on Unstructured Data"
## Structure
```
Data/
stackoverfow_questions_with_answers_by_tags.csv
apache_mailing_list.csv
mailing_lists_ASF_@dev_@users_1.csv
mailing_lists_ASF_@dev_@users_2.csv
quora.csv
sample_stackoverfow_questions_with_answers_by_tags.csv
Schemas/
apache_mailing_lists.json
quora.json
stackoverfow_questions_answers_by_tag.json
stackoverfow_tag_count.json
LDA-analysis
LDA_input/
stackoverfow_raw_dataset.csv
LDA_output/
Mallet/
output_csv/
docs-in-topics.csv
topic-words.csv
topics-in-docs.csv
topics-metadata.csv
output_html/
all_topics.html
Docs/
Topics/
Background-Study.pdf
Similar-Tools.md
```
## Contents of the Replication Package
Contains the data processed using the tool for the study.
- **Data/**
- `stackoverfow_questions_with_answers_by_tags.csv` - all StackOverflow questions used in the study as stored in Makar
- `apache_mailing_list.csv` - statistically significant sample of `mailing_lists_ASF_@dev_@users_1.csv` and `mailing_lists_ASF_@dev_@users_2.csv` used in the study
- `mailing_lists_ASF_@dev_@users_1.csv` - mailing list data used in the study as stored in Makar (part 1)
- `mailing_lists_ASF_@dev_@users_2.csv` - mailing list data used in the study as stored in Makar (part 2)
- `quora.csv` - all quora questions used in the study as stored in Makar
- `sample_stackoverfow_questions_with_answers_by_tags` - statistically significant sample of `stackoverfow_questions_with_answers_by_tags.csv` used in the study
- **Schemas/**
- `apache_mailing_lists.json` - data schema used in Makar to store mailing list data
- `quora.json` - data schema used in Makar to store quora data
- `stackoverfow_questions_answers_by_tag.json` - data schema used in Makar to store StackOverflow questions data
- `stackoverfow_tag_count.json` - data schema used in Makar to lookup number of questions per tag available in StackOverflow
- **LDA_input/** - input data used for LDA analysis
- `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
- **LDA_output/**
- **Mallet/** - contains the LDA output generated by MALLET tool
- **output_csv/**
- `docs-in-topics.csv` - documents per topic
- `topic-words.csv` - most relevant topic words
- `topics-in-docs.csv` - topic probability per document
- `topics-metadata.csv` - metadata per document and topic probability
- **output_html/** - Browsable results of mallet output
- `all_topics.html`
- `Docs/`
- `Topics/`
- **Background-Study.pdf** - Literature survey of challenges researchers face in mining the studies that investigate developer information needs during program comprehension tasks.
- **Similar-Tools.md** - Links of the compared similar state-of-art tools.
</code></pre>
<p> </p>
https://doi.org/10.5281/zenodo.4434822
oai:zenodo.org:4434822
eng
Zenodo
https://doi.org/10.5281/zenodo.4279452
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
International Conference on Software Analysis, Evolution and Reengineering (SANER), Virtual, 2021
Analysis of Multi-source unstructured data
Replication package for Makar
info:eu-repo/semantics/other