Published January 26, 2021
| Version v1.0.0
Dataset
Open
Replication package for the paper "What do Developers Discuss about Code Comment Conventions"
Description
# RP-commenting-conventions-multiple-sources
Replication Package for the paper "What do Developers Discuss about Code Comment Conventions?"
## Structure
```
Appendix.pdf
RQ1/
LDA_input/
stackoverfow_raw_dataset.csv
LDA_output/
Mallet/
output_csv/
docs-in-topics.csv
topic-words.csv
topics-in-docs.csv
topics-metadata.csv
output_html/
all_topics.html
Docs/
Topics/
RQ2/
datasource_rawdata/
mailing_lists_selection_criteria.csv
quora.csv
stackoverflow.csv
manual_analysis_output/
stackoverflow_quora_taxonomy.xlsx
```
## Contents of the Replication Package
---
- **Appendix.pdf**- Appendix of the paper containing supplement tables
- **RQ1/** - contains the data used to answer RQ1
- **LDA_input/** - input data used for LDA analysis
- `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
- **LDA_output/**
- **Mallet/** - contains the LDA output generated by MALLET tool
- **output_csv/**
- `docs-in-topics.csv` - documents per topic
- `topic-words.csv` - most relevant topic words
- `topics-in-docs.csv` - topic probability per document
- `topics-metadata.csv` - metadata per document and topic probability
- **output_html/** - Browsable results of mallet output
- `all_topics.html`
- `Docs/`
- `Topics/`
- **RQ2/** - contains the data used to answer RQ2
- **datasource_rawdata/** - contains the raw data for each source
- `mailing_lists_selection_criteria.csv` - criteria used to select mailing_lists.
- `quora.csv` - contains the processed dataset (like removing HTML tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using [Makar](https://github.com/maethub/makar) tool.
- `stackoverflow.csv` - contains the processed Stack Overflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using [Makar](https://github.com/maethub/makar) tool.
- **manual_analysis_output/**
- `stackoverflow_quora_taxonomy.xlsx` - contains the classified dataset of Stack Overflow and quora and a description of taxonomy.
- `Taxonomy` - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by `|` symbol.
- `stackoverflow-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
---
Files
ICPC-developer-comment-convention-discussions.zip
Files
(36.0 MB)
Name | Size | Download all |
---|---|---|
md5:507eb3ab6494def87852b28ddedd520c
|
36.0 MB | Preview Download |