Published August 16, 2020
| Version v.1.0.1
Dataset
Open
poojaruhal/RP-commenting-practices-social-media: RP-commenting-practices-social-media: RP_TOSEM_2020 v.1.0.1 Second release of of the replication Package for the paper "What do Developers Discuss about Code Comment Conventions on Social Media"
Description
RP-commenting-practices-social-media
Replication Package for the paper "What do Developers Discuss about Code Comment Conventions on Social Media?"
Structure
Paper-presenation.pdf
Makar_tool/
Data/
stackoverfow_questions_with_answers_by_tags.csv
stackoverfow_tags_metrics.csv
apache_mailing_list.csv
mailing_lists_ASF_@dev_@users_1.csv
mailing_lists_ASF_@dev_@users_2.csv
quora.csv
sample_stackoverfow_questions_with_answers_by_tags.csv
Schemas/
apache_mailing_lists.json
quora.json
stackoverfow_questions_answers_by_tag.json
stackoverfow_tag_count.json
stackoverfow_tag_metrics.json
RQ1/
LDA_input/
stackoverfow_raw_dataset.csv
LDA_output/
Mallet/
output_csv/
docs-in-topics.csv
topic-words.csv
topics-in-docs.csv
topics-metadata.csv
output_html/
all_topics.html
Docs/
Topics/
RQ2/
datasource_rawdata/
mailing_lists_selection_criteria.csv
quora.csv
stackoverflow.csv
manual_analysis_output/
stackoverflow_quora_taxonomy.xlsx
Contents of the Replication Package
Paper-presenation.pdf presents the highlights of the work in a presenation.
Makar_tool/ contains the data processed using the tool for the study
-
Data/
stackoverfow_questions_with_answers_by_tags.csv
- all stackoverflow questions used in the study as stored in Makarstackoverfow_tags_metrics.csv
- all data containing the calculations done for stackoverflow tag selectionapache_mailing_list.csv
- statistically significant sample ofmailing_lists_ASF_@dev_@users_1.csv
andmailing_lists_ASF_@dev_@users_2.csv
used in the studymailing_lists_ASF_@dev_@users_1.csv
- mailing list data used in the study as stored in Makar (part 1)mailing_lists_ASF_@dev_@users_2.csv
- mailing list data used in the study as stored in Makar (part 2)quora.csv
- all quora questions used in the study as stored in Makarsample_stackoverfow_questions_with_answers_by_tags
- statistically significant sample ofstackoverfow_questions_with_answers_by_tags.csv
used in the study
-
Schemas/
apache_mailing_lists.json
- data schema used in Makar to store mailing list dataquora.json
- data schema used in Makar to store quora datastackoverfow_questions_answers_by_tag.json
- data schema used in Makar to store stackoverflow questions datastackoverfow_tag_count.json
- data schema used in Makar to lookup number of questions per tag available in stackoverflowstackoverfow_tag_metrics.json
- data schema used in Makar to stackoverflow tag metrics data
-
RQ1/ - contains the data used to answer RQ1
- LDA_input/ - input data used for LDA analysis
stackoverfow_raw_dataset.csv
- stackoverflow questions used to perform LDA analysis
- LDA_output/
- Mallet/ - contains the LDA output generated by MALLET tool
- output_csv/
docs-in-topics.csv
- documents per topictopic-words.csv
- most relevant topic wordstopics-in-docs.csv
- topic probability per documenttopics-metadata.csv
- metadata per document and topic probability- output_html/ - Browsable results of mallet output
all_topics.html
Docs/
Topics/
- output_csv/
- Mallet/ - contains the LDA output generated by MALLET tool
- LDA_input/ - input data used for LDA analysis
-
RQ2/ - contains the data used to answer RQ2
- datasource_rawdata/ - contains the raw data for each source
mailing_lists_selection_criteria.csv
- criteria used to select mailing_lists.quora.csv
- contains the processed dataset (like removing HTML tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.stackoverflow.csv
- contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
- manual_analysis_output/
stackoverflow_quora_taxonomy.xlsx
- contains the classified dataset of stackoverflow and quora and description of taxonomy.Taxonomy
- contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by|
symbol.stackoverflow-posts
- the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.quota-posts
- the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
- datasource_rawdata/ - contains the raw data for each source
Files
poojaruhal/RP-commenting-practices-social-media-v.1.0.1.zip
Files
(50.1 MB)
Name | Size | Download all |
---|---|---|
md5:5746f749b347f7094067f432710abdc3
|
50.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/poojaruhal/RP-commenting-practices-social-media/tree/v.1.0.1 (URL)