Published October 21, 2025 | Version v1.0
Dataset Open

CS Knowledge Graph Dataset Family (CS-KG)

Authors/Creators

  • 1. ROR icon George Washington University

Description

This repository contains the Computer Science Knowledge Graph (CS-KG) dataset family, a large-scale collection of heterogeneous knowledge graphs constructed from OpenAlex and Semantic Scholar metadata. The dataset captures relational structures among four key entity types—Papers, Authors, Venues, and Concepts—with five principal relation types:

  • AUTHORED (Author → Paper, asymmetric)

  • CITES (Paper → Paper, asymmetric and temporal)

  • PUBLISHED_IN (Paper → Venue, functional)

  • BELONGS_TO (Paper → Concept, hierarchical)

  • COLLABORATES_WITH (Author ↔ Author, symmetric)

Together, these relations form a multi-relational scholarly network well-suited for evaluating knowledge graph embeddings, relational learning, and geometric deep learning models.

Each subgraph—CS-1K, CS-10K, CS-100K, CS-1M, and CS-10M—represents progressively larger scales (from thousands to millions of entities) and preserves real-world relational diversity across hierarchical, symmetric, and asymmetric patterns. All datasets have been preprocessed to remove duplicates and maintain temporal consistency by publication year.

Files

cs100k_openalex.db.zip

Files (3.6 GB)

Name Size Download all
md5:27de2599f586a1273e795ec8aeceec1a
70.9 MB Preview Download
md5:55486b7b9534cd74be6cbccc4290d84c
20.7 MB Preview Download
md5:75c45d605987c5853334b09cd15cc09e
8.0 MB Preview Download
md5:b23ddd72017047d1f0ca0feed525d8fb
2.5 MB Preview Download
md5:9870b476efa93b2ef499c1711e71c3dc
2.2 GB Preview Download
md5:139867b482b25c29384c1e32405edbbc
505.1 MB Preview Download
md5:a10524fed43a710394662e49bad72c95
984.1 kB Preview Download
md5:8324e9d053349b9264cc15ec152796db
319.2 kB Preview Download
md5:25a8edd5cef5a7627b1d6ac197aba43a
626.3 MB Preview Download
md5:427fe52b2c898697b5f1ae3fdf9868ee
152.1 MB Preview Download

Additional details