Published January 10, 2024
| Version v1
Dataset
Restricted
Cross-ecosystem categorization: A manual-curation protocol for the categorization of Java Maven libraries along Python PyPI Topics (dataset)
Description
This dataset reports all information needed to implement a human-guided protocol for the categorisation of libraries, from any software ecosystem, along the 24 top-level PyPI Topic classifiers. It also contains the data produced in a demonstration, where the protocol was applied to 256 open-source Java libraries from Maven Central with high- or critical-severity CVEs. This dataset can be used as ground truth for cross-ecosystem studies in software engineering, especially from functional and security perspectives. This dataset contains:
- the protocol designed to interpret sources for category assessment, and arbitrate the results;
- the sources and metadata, including CVEs, collected for the demonstration;
- the set of categorised libraries and CVE statistics, including a higher-level classification into Local or Remote network functionalities.
Files
Additional details
Funding
- European Commission
- ProSVED – Projection of Security Vulnerabilities caused by Exploits in Dependencies 101067199
- European Commission
- AssureMOSS – Assurance and certification in secure Multi-party Open Software and Services. 952647
- European Commission
- Sec4AI4Sec – Cybersecurity for AI-Augmented Systems 101120393