Code for the Paper "Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models"
Authors/Creators
- 1. Université de Lorraine, CNRS, Inria, LORIA
Contributors
Researcher (2):
Supervisor (2):
- 1. Université de Lorraine, CNRS, Inria, LORIA
Description
CVE-LMTune is a tool designed to scrape and preprocess vulnerability data from multiple sources and fine-tune Hugging Face Language Models (LMs) for multi-label classification across different security taxonomies, including CWE, CAPEC, and MITRE ATT&CK (Enterprise, ICS, and Mobile). It supports both standard classification and a hierarchical approach, where multiple models are trained to tackle subproblems across taxonomy levels, from broad categories (e.g., ATT&CK tactics) to more fine-granular subcategories (e.g., ATT&CK techniques).
Key features include:
- Automated data scraping and preprocessing for vulnerability entries
- Support for multiple taxonomies at different levels of granularity
- Fine-tuning of custom encoder-only LMs with custom configurations for the multi-label multi-class classification tasks
- The proposed hierarchical classification for improved multi-label classification performance
- Hyperparameter optimization using Optuna
- Support for the classification using text-generation LMs and custom prompts
The methodology is based on the paper "Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models" (accepted for publication at DIMVA 2026). A README file will provide guidance on code usage.
This tool will be subsequently disseminated on GitHub for broader use and collaboration.
Files
code.zip
Files
(113.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f78f6dae861a1f9fa5a4d9a61f606867
|
113.1 kB | Preview Download |