Software Open Access

Replication package for Makar

Pooja

Replication Package for the tool "Makar: A Framework for Multi-source Studies  based on Unstructured Data"

# RP-Makar-tool
Replication Package for the tool "Makar: A Framework for Multi-source Studies  based on Unstructured Data"

## Structure
```
Data/
	stackoverfow_questions_with_answers_by_tags.csv
	apache_mailing_list.csv
	mailing_lists_ASF_@dev_@users_1.csv
	mailing_lists_ASF_@dev_@users_2.csv
	quora.csv
	sample_stackoverfow_questions_with_answers_by_tags.csv

Schemas/
	apache_mailing_lists.json
	quora.json
	stackoverfow_questions_answers_by_tag.json
	stackoverfow_tag_count.json

LDA-analysis
    LDA_input/
        stackoverfow_raw_dataset.csv

    LDA_output/
        Mallet/
            output_csv/
                docs-in-topics.csv
                topic-words.csv
                topics-in-docs.csv
                topics-metadata.csv
            output_html/
                all_topics.html
                Docs/
                Topics/

Background-Study.pdf
Similar-Tools.md
```

## Contents of the Replication Package
 Contains the data processed using the tool for the study.

- **Data/**
     - `stackoverfow_questions_with_answers_by_tags.csv` - all StackOverflow questions used in the study as stored in Makar
    - `apache_mailing_list.csv` - statistically significant sample of `mailing_lists_ASF_@dev_@users_1.csv` and `mailing_lists_ASF_@dev_@users_2.csv` used in the study
    - `mailing_lists_ASF_@dev_@users_1.csv` - mailing list data used in the study as stored in Makar (part 1)
    - `mailing_lists_ASF_@dev_@users_2.csv` - mailing list data used in the study as stored in Makar (part 2)
    - `quora.csv` - all quora questions used in the study as stored in Makar
    - `sample_stackoverfow_questions_with_answers_by_tags` - statistically significant sample of `stackoverfow_questions_with_answers_by_tags.csv` used in the study

- **Schemas/**
    - `apache_mailing_lists.json` - data schema used in Makar to store mailing list data
    - `quora.json` - data schema used in Makar to store quora data
    - `stackoverfow_questions_answers_by_tag.json` - data schema used in Makar to store StackOverflow questions data
    - `stackoverfow_tag_count.json` - data schema used in Makar to lookup number of questions per tag available in StackOverflow

 - **LDA_input/** - input data used for LDA analysis
    - `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
 - **LDA_output/**
    - **Mallet/** - contains the LDA output generated by MALLET tool
         - **output_csv/**
            - `docs-in-topics.csv` - documents per topic
            - `topic-words.csv` - most relevant topic words
            - `topics-in-docs.csv` - topic probability per document
            - `topics-metadata.csv` - metadata per document and topic probability
        - **output_html/** - Browsable results of mallet output
            - `all_topics.html`
            - `Docs/`
            - `Topics/`

- **Background-Study.pdf** - Literature survey of challenges researchers face in mining the studies that investigate developer information needs during program comprehension tasks.
- **Similar-Tools.md** - Links of the compared similar state-of-art tools.

 

Files (65.0 MB)
Name Size
RP-Makar-tool.zip
md5:7b18d80f6ef5092753f35306b4278eb7
65.0 MB Download
128
7
views
downloads
All versions This version
Views 12834
Downloads 74
Data volume 430.6 MB259.8 MB
Unique views 11034
Unique downloads 74

Share

Cite as