There is a newer version of this record available.

Dataset Open Access Open Source Repository and Dependency Metadata

Andrew Nesbitt; Benjamin Nickolls

What is in this release?

In this release you will find data about software distributed and/or crafted publicly on the Internet. You will find information about its development, its distribution and its relationship with other software included as a dependency. You will not find any information about the individuals who create and maintain these projects.

Further information and documentation on this data set can be found at

For enquiries please contact

This dataset contains seven csv files:


A project is a piece of software available on any one of the 33 package managers supported by


A version is an immutable published version of a Project from a package manager. Not all package managers have a concept of publishing versions, often relying directly on tags/branches from a revision control tool.


A tag is equivalent to a tag in a revision control system. Tags are sometimes used instead of Versions where a package manager does not use the concept of versions. Tags are often semantic version numbers.


Dependencies describe the relationship between a project and the software it builds upon. Dependencies belong to Version. Each Version can have different sets of dependencies. Dependencies point at a specific Version or range of versions of other projects.


A repository represents a publically accessible source code repository from either, or Repositories are distinct from Projects, they are not distributed via a package manager and typically an application for end users rather than component to build upon.

Repository dependencies

A repository dependency is a dependency upon a Version from a package manager has been specified in a manifest file, either as a manually added dependency committed by a user or listed as a generated dependency listed in a lockfile that has been automatically generated by a package manager and committed.

Projects with related Repository fields

This is an alternative projects export that denormalizes a projects related source code repository inline to reduce the need to join between two data sets.


This dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International Licence.

This licence provides the user with the freedom to use, adapt and redistribute this data. In return the user must publish any derivative work under a similarly open licence, attributing as a data source. The full text of the licence is included in the data.

Access, Attribution and Citation

The dataset is available to download from Zenodo at

Please attribute as a data source by including the words ‘Includes data from’ and reference the Digital Object identifier: 10.5281/Zenodo.808273.

Files (5.9 GB)
Name Size
5.9 GB Download


Cite as