Published April 9, 2026 | Version v0.0
Dataset Open

cs-cocitations

  • 1. ROR icon Tokyo Metropolitan University

Description

Summary
The cs-cocitations dataset is a co-citation hypergraph of highly cited Computer Science papers, constructed from the OpenAlex Snapshot (2024-09-27; https://developers.openalex.org/download/snapshot-format). 

Nodes and Hyperedges
Nodes represent highly cited Computer Science papers. For each computer science subfield, papers were ranked by citation count, and those accounting for the top 10% of cumulative citations were selected as nodes (minimum 100 papers per subfield), yielding 3,118 nodes in total. 

Hyperedges represent co-citation relationships. For each paper in the OpenAlex corpus (excluding the selected top papers), the subset of top papers it cites was identified. If the same subset appeared in at least 3 citing papers, it was included as a hyperedge, yielding 53,886 hyperedges in total.
 
Each node carries the following attributes: OpenAlex work ID, paper title, publication date, primary topic, subfield, field, domain, and citation count. 

Basic statistics:
- Nodes: 3,118
- Hyperedges: 53,886

Source:
OpenAlex Snapshot (2024-09-27), https://developers.openalex.org/download/snapshot-format

Reference:
Kazuki Nakajima, Yuya Sasaki, Takeaki Uno, and Masaki Aida. (2025). Learning Multi-Order Block Structure in Higher-Order Networks. arXiv preprint arXiv:2511.21350.

Files

cs-cocitations.json

Files (9.4 MB)

Name Size Download all
md5:7000be7eaf8f589cf1ef8b17ddf04a6f
9.4 MB Preview Download