Published November 18, 2025 | Version v1
Dataset Open

5G and Related Network Infrastructure CVE-Annotated Dataset: Distinguishing 5G Native, LTE, Auxiliary to 5G, and Non-5G Vulnerabilities

  • 1. ROR icon Fondazione "Ugo Bordoni"

Description

Dataset Description

The dataset was generated using the source code available at https://doi.org/10.5281/zenodo.17572825 and was subsequently manually annotated.

Starting from a selection of CVEs selected on the basis of a keyword whitelist of terms inherent to 5G, the dataset includes 1,531 annotated CVE entries retrieved from the NIST NVD, covering the years 2019 through 2025, classified into four labels:

  • 5G: if the vulnerability directly impacts 5G infrastructures, protocols, or specific 5G components, it receives the ”5g” label, indicating direct relevance to 5G security.
  • auxiliary: if the vulnerability has indirect implications for 5G systems, such as those affecting shared infrastructure, common protocols, or components that bridge LTE and 5G networks, it is labeled ”auxiliary”.
  • lte: if the vulnerability does not directly affect 5G networks but is specific to LTE, it is classified ”lte”, representing legacy 4G vulnerabilities without 5G implications.
  • no5G: if the vulnerability demonstrates no relationship to 5G technology, either directly or indirectly, the ”no5g” category is assigned.

In the following table, the frequency of the labels is presented:

Label Frequency
5g 255
auxiliary 169
lte 95
no5G 1012

The dataset exhibits a significant class imbalance, with varying distributions across the four classification categories. This imbalance reflects the real-world distribution of vulnerabilities but may pose challenges for machine learning model training and evaluation.

To address the class imbalance issue and facilitate binary classification tasks, a balanced version of the dataset is also provided as an additional column of the CSV file. This balanced subset comprises 255 samples for the 5G class and 255 samples for the no5G class, totaling 510 entries.

Technical info (English)

Dataset Characteristics

Size: 1531 CVE vulnerabilities records
Time period: January 1, 2019 - July 1, 2025
Format: CSV (Comma-Separated Values)
Encoding: UTF-8
Data collection date: July 11, 2025

Data Structure

The dataset is organized into 8 columns:

1. CVE ID

  • Name: CVE-ID
  • Type: String
  • Description: Unique vulnerability identifier according to CVE standard
  • Format: CVE-YYYY-NNNNN (e.g., CVE-2023-43239)
  • Purpose: Traceability and unique vulnerability reference

2. CVE Description

  • Name: Description
  • Type: String
  • Description: Detailed technical description of the vulnerability, providing information such as vulnerability type, affected components, and potential impact
  • Format: no format
  • Purpose: provide a human-readable description of the vulnerability
  • Language: English, specialized technical terminology

3. CPE (Common Platform Enumeration)

  • Name: CPE
  • Type: String
  • Description: Standardized identifiers of the vulnerable platforms/products (there can be multiple values)
  • Format: cpe:2.3:part:vendor:product:version:..., cpe:2.3:...
  • Purpose: Precise identification of the affected system or component

4. CWE (Common Weakness Enumeration)

  • Name: CWE
  • Type: String
  • Description: CWE (Common Weakness Enumeration) identifier that classifies the type of weakness related to the given vulnerability according to MITRE's standardized taxonomy
  • Format: CWE-[number] (e.g., CWE-79, CWE-89, CWE-787)
  • Purpose: Categorize and identify the nature of the vulnerability according to a standardized hierarchical classification, facilitating the identification of common patterns, searching for similar vulnerabilities, and implementing appropriate mitigations

5. CVSS (Common Vulnerability Scoring System)

  • Name: CVSS
  • Type: Float range [0,10.0]
  • Description: CVSS (Common Vulnerability Scoring System) score that provides a numerical assessment of the severity of a vulnerability based on its intrinsic characteristics.
  • Format: Decimal number with one decimal place precision
  • Purpose: An approach that measures the main aspects of a vulnerability and assigns it a numeric severity score, which can then be expressed as a risk level (low, medium, high, or critical)

6. CVSS Vector

  • Name: CVSS-Vector
  • Type: String
  • Description: CVSS (Common Vulnerability Scoring System) vector string that describes the metric characteristics of the vulnerability according to the CVSS v3.1 standard
  • Format: CVSS:3.1/AV:[N|A|L|P]/AC:[L|H]/PR:[N|L|H]/UI:[N|R]/S:[U|C]/C:[N|L|H]/I:[N|L|H]/A:[N|L|H]
  • Purpose: Provide a standardized and detailed representation of vulnerability characteristics to enable automatic CVSS score calculation and objective vulnerability comparison

7. Multiclass label for 5G, LTE, Auxiliary, Non-5G classes

  • Name: Multiclass
  • Type: String
  • Description: Final classification label
  • Format: [5g | lte | auxiliary | no55]
  • Purpose: Enable multi-label classification of vulnerabilities or systems based on their associated network technologies

8. Binary label for 5G/no5G

  • Name: Label
  • Type: String
  • Description: Binary classification label (with N/A values) for the entries of a balanced dataset
  • Format: [5g | no55 | N/A ],  where "5g" is used for 5G network-related vulnerabilities, "no5g" for vulnerabilities not correlated to 5G networks, and "N/A" for entries not considered in the balanced dataset
  • Purpose: Binary classification for machine learning algorithms

Notes (English)

This work was partially supported by the SERICS project (PE00000014) under the NRRP MUR program, funded by the EU-NGEU.

Files

dataset_cve_5G_network.csv

Files (17.1 MB)

Name Size Download all
md5:87014c06645606f79b5a0bceb78de043
17.1 MB Preview Download

Additional details

Related works

Continues
Dataset: 10.5281/zenodo.16736495 (DOI)
Is derived from
Software: 10.5281/zenodo.17572825 (DOI)