There is a newer version of the record available.

Published January 29, 2026 | Version v4
Software Open

Code for the Paper "Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models"

  • 1. Université de Lorraine, CNRS, Inria, LORIA

Description

CVE-LMTune is a tool designed to scrape and preprocess vulnerability data from multiple sources and fine-tune Hugging Face Language Models (LMs) for multi-label classification across different security taxonomies, including CWE, CAPEC, and MITRE ATT&CK (Enterprise, ICS, and Mobile). It supports both standard classification and a hierarchical approach, where multiple models are trained to tackle subproblems across taxonomy levels, from broad categories (e.g., ATT&CK tactics) to more fine-granular subcategories (e.g., ATT&CK techniques).

Key features include:

  • Automated data scraping and preprocessing for vulnerability entries
  • Support for multiple taxonomies at different levels of granularity
  • Fine-tuning of custom encoder-only LMs with custom configurations for the multi-label multi-class classification tasks
  • The proposed hierarchical classification for improved multi-label classification performance
  • Hyperparameter optimization using Optuna
  • Support for the classification using text-generation LMs and custom prompts 

The methodology is based on the paper "Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models" (accepted for publication at DIMVA 2026). A README file will provide guidance on code usage.

This tool will be subsequently disseminated on GitHub for broader use and collaboration.

Files

code.zip

Files (113.1 kB)

Name Size Download all
md5:f78f6dae861a1f9fa5a4d9a61f606867
113.1 kB Preview Download