Mathlib4 Theorem Dependency Graph (LeanDojo Benchmark 4, v10)
Authors/Creators
Description
This dataset contains a directed graph of theorem dependencies extracted from the Mathlib4 library, a formal mathematics ecosystem for the Lean 4 proof assistant. The graph was constructed from the LeanDojo Benchmark 4 (v10), originally published on Zenodo (DOI: 10.5281/zenodo.8040109).
Each node represents a theorem or lemma identified by its fully qualified name within Mathlib4. A directed edge from node A to node B indicates that theorem A was used as a tactic in the proof of theorem B. Nodes are annotated with the file path of their source file within the Mathlib4 repository, which encodes the mathematical area the theorem belongs to (e.g., Mathlib/Topology/Semicontinuous.lean).
The graph was exported from a Python pipeline using NetworkX and is provided in GraphML and GEXF formats for interoperability.
Basic statistics:
Nodes: 137,046
Edges: 304,433
Average degree: 4.44
Isolated nodes: 43,138 (~32%)
Strongly connected components: 137,046 (all trivial, confirming the graph is a DAG)
Weakly connected components: 43,936
Source data: LeanDojo Benchmark 4, v10 (random/train.json, random/test.json, random/val.json)
Related publication: Yang et al., LeanDojo; The mathlib Community, CPP 2020.
Files
Files
(61.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:dd769fb0d228cfba3976d22cff4eef4e
|
61.1 MB | Download |
Additional details
Dates
- Submitted
-
2026-04-28