Published September 10, 2025 | Version v2
Dataset Open

Secure Chain Open Dataset

Description

This repository contains the files and scripts needed to initialise and populate the databases used by SecureChain. These databases are divided into two, the first database containing information on approximately 270,000 vulnerabilities, along with the associated Common Weak Enumerations. And also, approximately 260,000 exploits. The second database contains a dependency graph associated with vulnerabilities, which has been created with information extracted from the Node Package Manager (NPM), the Python Package Index (PyPI), Ruby Gems, and Cargo Crates. In addition, it is currently being populated with information extracted from Maven. It currently has the following issues:

  • Total: 4,489,424 packages and 64,662,397 versions.
  • NPM: 3,461,263 packages and 50,943,372 versions.
  • PyPI: 599,307 packages and 6,875,336 versions.
  • Ruby Gems: 208,165 packages and 1,738,773 versions.
  • Cargo Crates: 168,944 packages and 1,393,371 versions.
  • Maven: 51,745 packages and 3,711,545 versions.

Folder Structure

- `.env` and `template.env`: Environment variable configuration files for Neo4j and MongoDB connections.
- `docker-compose.yml`: Docker Compose file to launch database services in containers.
- `seeds/`: Contains scripts and data for seeding the databases.
    - `mongo_seeder.sh`: Script to import data into MongoDB.
    - `neo4j_seeder.sh`: Script to restore the Neo4j dump.
    - `mongo/`: MongoDB data organized by collections.
        - `vulnerabilities/`: Data and metadata for the `vulnerabilities`, `exploits` and `CWEs` collections.
    - `neo4j/`: Neo4j database dump (`neo4j.dump`).

Graph Structure

Secure Chain represents the software supply chain as a directed graph, where each node and relationship encodes specific knowledge about packages, their versions, and their dependencies. This model allows deep reasoning about software composition, versioning, and risk propagation. The following documentation explains the components of this graph structure as illustrated in the architecture diagram.

Node: Version

Represents a specific version of a software package. A version has the following attributes:

  • mean: A general risk or severity score (e.g., average CVSS score).
  • name: Full name or label of the version (e.g., requests).
  • release_date: The exact date this version was published.
  • serial_number: A sortable serial representation of the version (used to compare versions efficiently).
  • vulnerabilities: A list of known vulnerability identifiers (e.g., CVEs or OSV IDs) that affect this version.
  • weighted_mean: A weighted severity score, which considers not just the severity of vulnerabilities but also their reach or impact.

This node is essential for understanding when a package version became available and whether it has known vulnerabilities or risks.

Node: Dependency

Represents a software package, as it might appear in a dependency file (package.jsonrequirements.txtpom.xmlsbom.json etc.). A Package has the following attributes:

  • import_names: The list of keywords used to import or reference the package in code (e.g., lodashexpress).
  • moment: The point in time when this dependency was recorded or declared.
  • name: Canonical name of the dependency.
  • repository_url: URL to the source repository (GitHub, GitLab, Bitbucket, etc.).
  • vendor: The organization, maintainer, or vendor responsible for the package.

Dependency nodes are the anchor points for packages in the ecosystem, and are linked to their actual released versions.

Usage

1. Configure environment variables in a .env file using this template as a reference:

# For dockerized backend and database
GRAPH_DB_URI='bolt://neo4j:7687'
VULN_DB_URI='mongodb://mongoSecureChain:mongoSecureChain@mongo:27017/admin'

# Databases settings
GRAPH_DB_USER='neo4j'
GRAPH_DB_PASSWORD='neoSecureChain'
VULN_DB_USER='mongoSecureChain'
VULN_DB_PASSWORD='mongoSecureChain'

2. Start the services with Docker Compose: docker-compose up --build

3. The databases population is made automatically.

Notes

  • The seed scripts check if the databases already contain data before importing.
  • .bson.gz and .metadata.json.gz files contain data and metadata for MongoDB collections.
  • The neo4j.dump file is the Neo4j database dump.

---

For more information about databases configuration, see Secure Chain documentation.

Files

SecureChainData.zip

Files (20.4 GB)

Name Size Download all
md5:797bf2af1407fdc643c15635d9738ef0
20.4 GB Preview Download

Additional details

Dates

Available
2025-08-04