Published February 22, 2023
| Version 1.0
Dataset
Open
Reddit and StackOverflow dataset (Programming languages)
Description
This data set contains anonymized data collected from Reddit (via the Pushshift API) and StackOverflow (from Kaggle's dataset).
Each folder includes the data split by trimester. The schema of StackOverflow and Reddit-related files follows:
- Fields from StackOverflow
- question_id
- answer_id
- creation_date - answer creation_date
- score - score of the question/answer
- tags - all tags flagged for a question
- answer_count - number of answers for a question
- start_question - question's time of creation
- last_activity_date - last update on the question
- new_id - hashed id of the answerer
- q_new_id - hashed id of the questioner
- Fields from Reddit
- comment_id
- submission_id
- score - score of the question/submission
- subreddit
- created_utc - time of creation (unrelated to last modified comments)
- new_id - hashed id
The .txt files represent the structure of the corresponding hypergraphs.
Files
data.zip
Files
(134.1 MB)
Name | Size | Download all |
---|---|---|
md5:405c95a36c527c85d3708fe3a473386c
|
134.1 MB | Preview Download |