Published January 10, 2024 | Version v1
Dataset Restricted

Cross-ecosystem categorization: A manual-curation protocol for the categorization of Java Maven libraries along Python PyPI Topics (dataset)

  • 1. ROR icon University of Trento
  • 2. ROR icon Vrije Universiteit Amsterdam


This dataset reports all information needed to implement a human-guided protocol for the categorisation of libraries, from any software ecosystem, along the 24 top-level PyPI Topic classifiers. It also contains the data produced in a demonstration, where the protocol was applied to 256 open-source Java libraries from Maven Central with high- or critical-severity CVEs. This dataset can be used as ground truth for cross-ecosystem studies in software engineering, especially from functional and security perspectives. This dataset contains:

  • the protocol designed to interpret sources for category assessment, and arbitrate the results;
  • the sources and metadata, including CVEs, collected for the demonstration;
  • the set of categorised libraries and CVE statistics, including a higher-level classification into Local or Remote network functionalities.



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Reviewing for the submission to Information and Software Technology.

You are currently not logged in. Do you have an account? Log in here

Additional details


European Commission
ProSVED – Projection of Security Vulnerabilities caused by Exploits in Dependencies 101067199
European Commission
AssureMOSS – Assurance and certification in secure Multi-party Open Software and Services. 952647
European Commission
Sec4AI4Sec – Cybersecurity for AI-Augmented Systems 101120393