Dataset Open Access

Maven central dependency graph

Amine Benelallam; Nicolas Harrand; César Soto Valero; Benoit Baudry; Olivier Barais

The Maven dependency graph is an open dataset of Maven Central artifacts, their dependencies, as well as other relationships. Its main intent is to domesticate the wild within and around the Maven central ecosystem, in particular, and JVM-based libraries at large, making it more harnessable to both academics and industry. It is intended to answer high-level research questions concerning artifacts releases, evolution, and usage trends over time. It can also be used to assist researcher selecting relevant datasets, among the mass of existing software artifact, for assessing particular empirical software engineering challenges. The complexity of these questions can range from simple pattern matching to advanced big data analysis and machine learning techniques. The accompanying paper to this dataset is publicly available on arXiv.

The Maven dependency graph is the fruit of a collaboration between the DiverSE team (Inria Rennes, France) and CASTOR project (KTH, Sweden). Instructions on how to use and reproduce the dataset can be found in the dataset's repository on [Github]( A complete description of the dataset and usages can be found in the accompanying [paper] (
Files (3.1 GB)
Name Size
72.0 MB Download
2.8 GB Download
282.9 MB Download
All versions This version
Views 367367
Downloads 8181
Data volume 37.0 GB37.0 GB
Unique views 320320
Unique downloads 5757


Cite as