Dataset - What are the Machine Learning best practices reported by practitioners on Stack Exchange?
Authors/Creators
- 1. University of Passau, Universidad de los Andes
- 2. Universidad de los Andes
- 3. University of Passau
- 4. Universidad Nacional
Description
The data correspond to the posts (questions and answers) retrieved by querying for posts related to the tag 'machine learning' and the phrase 'best practice(s).' The data were used as the basis for a study currently under review on discussing machine learning best practices as discussed by practitioners in question-and-answer communities such as Stack Exchange. The information from each type of post (i.e., questions and answers) is presented in multiple formats (i.e., .txt, .csv, and .xlsx).
Answers - Variables
- AID: Unique identification of the answer in the Q&A website.
- ParentId: Unique identification of the question associated with the answer in the Q&A website
- AcceptedAnswerId : In the case in which an answer is the most voted question associated with the ParentId, and it is different from the accepted answer, a different identifier from the AID is available. In the case in which the accepted question had a score lower than 1, a -1 is assigned.
- ABody: HTML text of the answer.
- Score: Upvotes - downvotes of the answer.
- url_Answer: URL of the answer. The question URL can be from different websites.
- type: best or accepted. Accepted in the case that the information belongs to the accepted answer of the ParentId question and best in the case in which it is the most voted question of the ParentId question.
- Date: Creation date of the answer.
Questions - Variables
- QID: Unique identification of the question in the Q&A website.
- AcceptedAnswerId: Unique identification of the accepted answer for a specific question in the Q&A website. In the case in which a question had a most-voted answer different from the accepted one, and the accepted one had a negative score, a -1 was assigned to the AcceptedAnswerId.
- BestAnswerId: Unique identification of the most voted answer for a specific question in the Q&A website. In the case in which the most voted and accepted questions were the same, then a -1 was assigned to the BestAnswerId.
- Qtitle: Title of the question.
- QBody: HTML text of the question.
- Score: Upvotes - downvotes of the questions.
- QTags: Tags that are associated with each question.
- url_question: URL of the question. The question URL can be from different websites.
- Date: Creation date of the question
This dataset is a subset of the Stack Exchange dump of 03.2021 (https://archive.org/details/stackexchange_20210301) in which a series of filters were applied to obtain the data used in the study.
Files
answers.csv
Files
(1.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d51a8afd67d29818cda3d3331f3e613f
|
217.9 kB | Preview Download |
|
md5:da1e86afc8651765c3dfd819997fdda2
|
217.9 kB | Preview Download |
|
md5:265313f69365a1d149f0a6f6dacd428f
|
84.9 kB | Download |
|
md5:c7c4e524e1c869de9ad1ff03bbbabf66
|
220.2 kB | Preview Download |
|
md5:95ec6438a554dd8e60abcb1d9e712aef
|
220.1 kB | Preview Download |
|
md5:aec528ca0e11f7b36267cc1ab8f27c88
|
97.0 kB | Download |