CS Knowledge Graph Dataset Family (CS-KG)
Description
This repository contains the Computer Science Knowledge Graph (CS-KG) dataset family, a large-scale collection of heterogeneous knowledge graphs constructed from OpenAlex and Semantic Scholar metadata. The dataset captures relational structures among four key entity types—Papers, Authors, Venues, and Concepts—with five principal relation types:
-
AUTHORED(Author → Paper, asymmetric) -
CITES(Paper → Paper, asymmetric and temporal) -
PUBLISHED_IN(Paper → Venue, functional) -
BELONGS_TO(Paper → Concept, hierarchical) -
COLLABORATES_WITH(Author ↔ Author, symmetric)
Together, these relations form a multi-relational scholarly network well-suited for evaluating knowledge graph embeddings, relational learning, and geometric deep learning models.
Each subgraph—CS-1K, CS-10K, CS-100K, CS-1M, and CS-10M—represents progressively larger scales (from thousands to millions of entities) and preserves real-world relational diversity across hierarchical, symmetric, and asymmetric patterns. All datasets have been preprocessed to remove duplicates and maintain temporal consistency by publication year.
Files
cs100k_openalex.db.zip
Files
(3.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:27de2599f586a1273e795ec8aeceec1a
|
70.9 MB | Preview Download |
|
md5:55486b7b9534cd74be6cbccc4290d84c
|
20.7 MB | Preview Download |
|
md5:75c45d605987c5853334b09cd15cc09e
|
8.0 MB | Preview Download |
|
md5:b23ddd72017047d1f0ca0feed525d8fb
|
2.5 MB | Preview Download |
|
md5:9870b476efa93b2ef499c1711e71c3dc
|
2.2 GB | Preview Download |
|
md5:139867b482b25c29384c1e32405edbbc
|
505.1 MB | Preview Download |
|
md5:a10524fed43a710394662e49bad72c95
|
984.1 kB | Preview Download |
|
md5:8324e9d053349b9264cc15ec152796db
|
319.2 kB | Preview Download |
|
md5:25a8edd5cef5a7627b1d6ac197aba43a
|
626.3 MB | Preview Download |
|
md5:427fe52b2c898697b5f1ae3fdf9868ee
|
152.1 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/JugalGajjar/HyperComplEx-Multi-Space-KG-Embeddings
- Programming language
- Python