Published November 18, 2020
| Version v1.0.0
Software
Open
Replication package for Makar
Description
Replication Package for the tool "Makar: A Framework for Multi-source Studies based on Unstructured Data"
# RP-Makar-tool
Replication Package for the tool "Makar: A Framework for Multi-source Studies based on Unstructured Data"
## Structure
```
Data/
stackoverfow_questions_with_answers_by_tags.csv
stackoverfow_tags_metrics.csv
apache_mailing_list.csv
mailing_lists_ASF_@dev_@users_1.csv
mailing_lists_ASF_@dev_@users_2.csv
quora.csv
sample_stackoverfow_questions_with_answers_by_tags.csv
Resultant-SO-Quora-taxonomy.xlsx
Schemas/
apache_mailing_lists.json
quora.json
stackoverfow_questions_answers_by_tag.json
stackoverfow_tag_count.json
stackoverfow_tag_metrics.json
LDA-analysis
LDA_input/
stackoverfow_raw_dataset.csv
LDA_output/
Mallet/
output_csv/
docs-in-topics.csv
topic-words.csv
topics-in-docs.csv
topics-metadata.csv
output_html/
all_topics.html
Docs/
Topics/
Background-Study.pdf
```
## Contents of the Replication Package
Contains the data processed using the tool for the study.
- **Data/**
- `stackoverfow_questions_with_answers_by_tags.csv` - all StackOverflow questions used in the study as stored in Makar
- `stackoverfow_tags_metrics.csv` - all data containing the calculations done for StackOverflow tag selection
- `apache_mailing_list.csv` - statistically significant sample of `mailing_lists_ASF_@dev_@users_1.csv` and `mailing_lists_ASF_@dev_@users_2.csv` used in the study
- `mailing_lists_ASF_@dev_@users_1.csv` - mailing list data used in the study as stored in Makar (part 1)
- `mailing_lists_ASF_@dev_@users_2.csv` - mailing list data used in the study as stored in Makar (part 2)
- `quora.csv` - all quora questions used in the study as stored in Makar
- `sample_stackoverfow_questions_with_answers_by_tags` - statistically significant sample of `stackoverfow_questions_with_answers_by_tags.csv` used in the study
- `Resultant-SO-Quora-taxonomy` - Result of a manual analysis of Stack overflow and Quora sample set
- **Schemas/**
- `apache_mailing_lists.json` - data schema used in Makar to store mailing list data
- `quora.json` - data schema used in Makar to store quora data
- `stackoverfow_questions_answers_by_tag.json` - data schema used in Makar to store StackOverflow questions data
- `stackoverfow_tag_count.json` - data schema used in Makar to lookup number of questions per tag available in StackOverflow
- `stackoverfow_tag_metrics.json` - data schema used in Makar to StackOverflow tag metrics data
- **LDA_input/** - input data used for LDA analysis
- `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
- **LDA_output/**
- **Mallet/** - contains the LDA output generated by MALLET tool
- **output_csv/**
- `docs-in-topics.csv` - documents per topic
- `topic-words.csv` - most relevant topic words
- `topics-in-docs.csv` - topic probability per document
- `topics-metadata.csv` - metadata per document and topic probability
- **output_html/** - Browsable results of mallet output
- `all_topics.html`
- `Docs/`
- `Topics/`
- **Background-Study.pdf** - Literature survey of challenges researchers face in mining the studies that investigate developer information needs during program comprehension tasks.
Files
RP-Makar-tool.zip
Files
(65.2 MB)
Name | Size | Download all |
---|---|---|
md5:d7231ce04f17e4150112b08bc6dd6a69
|
65.2 MB | Preview Download |