Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.

There is a newer version of the record available.

Published November 18, 2020 | Version v1.0.0
Software Open

Replication package for Makar

Creators

  • 1. Anonymous

Description

Replication Package for the tool "Makar: A Framework for Multi-source Studies  based on Unstructured Data"

# RP-Makar-tool
Replication Package for the tool "Makar: A Framework for Multi-source Studies  based on Unstructured Data"

## Structure
```
Data/
	stackoverfow_questions_with_answers_by_tags.csv
	stackoverfow_tags_metrics.csv
	apache_mailing_list.csv
	mailing_lists_ASF_@dev_@users_1.csv
	mailing_lists_ASF_@dev_@users_2.csv
	quora.csv
	sample_stackoverfow_questions_with_answers_by_tags.csv
    Resultant-SO-Quora-taxonomy.xlsx
Schemas/
	apache_mailing_lists.json
	quora.json
	stackoverfow_questions_answers_by_tag.json
	stackoverfow_tag_count.json
	stackoverfow_tag_metrics.json

LDA-analysis
    LDA_input/
        stackoverfow_raw_dataset.csv

    LDA_output/
        Mallet/
            output_csv/
                docs-in-topics.csv
                topic-words.csv
                topics-in-docs.csv
                topics-metadata.csv
            output_html/
                all_topics.html
                Docs/
                Topics/

Background-Study.pdf
```

## Contents of the Replication Package
 Contains the data processed using the tool for the study.

- **Data/**
     - `stackoverfow_questions_with_answers_by_tags.csv` - all StackOverflow questions used in the study as stored in Makar
    - `stackoverfow_tags_metrics.csv` - all data containing the calculations done for StackOverflow tag selection
    - `apache_mailing_list.csv` - statistically significant sample of `mailing_lists_ASF_@dev_@users_1.csv` and `mailing_lists_ASF_@dev_@users_2.csv` used in the study
    - `mailing_lists_ASF_@dev_@users_1.csv` - mailing list data used in the study as stored in Makar (part 1)
    - `mailing_lists_ASF_@dev_@users_2.csv` - mailing list data used in the study as stored in Makar (part 2)
    - `quora.csv` - all quora questions used in the study as stored in Makar
    - `sample_stackoverfow_questions_with_answers_by_tags` - statistically significant sample of `stackoverfow_questions_with_answers_by_tags.csv` used in the study
    - `Resultant-SO-Quora-taxonomy` - Result of a manual analysis of Stack overflow and Quora sample set

- **Schemas/**
    - `apache_mailing_lists.json` - data schema used in Makar to store mailing list data
    - `quora.json` - data schema used in Makar to store quora data
    - `stackoverfow_questions_answers_by_tag.json` - data schema used in Makar to store StackOverflow questions data
    - `stackoverfow_tag_count.json` - data schema used in Makar to lookup number of questions per tag available in StackOverflow
    - `stackoverfow_tag_metrics.json` - data schema used in Makar to StackOverflow tag metrics data

 - **LDA_input/** - input data used for LDA analysis
    - `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
 - **LDA_output/**
    - **Mallet/** - contains the LDA output generated by MALLET tool
         - **output_csv/**
            - `docs-in-topics.csv` - documents per topic
            - `topic-words.csv` - most relevant topic words
            - `topics-in-docs.csv` - topic probability per document
            - `topics-metadata.csv` - metadata per document and topic probability
        - **output_html/** - Browsable results of mallet output
            - `all_topics.html`
            - `Docs/`
            - `Topics/`

- **Background-Study.pdf** - Literature survey of challenges researchers face in mining the studies that investigate developer information needs during program comprehension tasks.

 

Files

RP-Makar-tool.zip

Files (65.2 MB)

Name Size Download all
md5:d7231ce04f17e4150112b08bc6dd6a69
65.2 MB Preview Download