Published November 19, 2023 | Version v0.0
Dataset Open

tags-stack-overflow

  • 1. CENTAI Institute

Description

Overview

This dataset is derived from tags on Stack Overflow posts. Each hyperedge corresponds to all of the tags used in a post, and each node in a hyperedge corresponds to a tag. The timestamps of the posts are in millisecond resolution, are adjusted so that the time of the earliest tag starts at 0, and are in ISO8601 format.

Statistics

Some basic statistics of this dataset are:

  • number of nodes: 49,998
  • number of timestamped hyperedges: 14,458,875
  • number of unique hyperedges: 5,675,497
  • Component sizes:

Component size, number

  • 49931, 1
  • 2, 7
  • 1, 53

Source of original data

References

If you use this data, please cite the following paper:

Files

tags-stack-overflow.json

Files (2.0 GB)

Name Size Download all
md5:bccd5752e09a7383f7c53a0fccb67d44
2.0 GB Preview Download