Automating Cybersecurity TTP Classification Based on Unstructured Attack Descriptions
Creators
Description
CTI sources help SOCs to share important information about incidents and attacks. Unstructured text processing gains importance, considering that incident-related information is present in a wide range of sources. The datasets in the literature contain insufficiently lengthy text or a limited number of samples per class. Therefore, we proposed a method to build a semi-automatic dataset using the CTI sources. As a result, we have presented a new dataset of unstructured CTI descriptions called Weakness, Attack, Vulnerabilities, and Events 27k (WAVE-27K). WAVE-27K includes information on 27 different MITRE techniques and 7 tactics, containing 22539 samples associated with a single technique and 5262 samples related to two or more techniques. WAVE-27K is the largest dataset compared to those in the literature. We trained a BERT-based model using WAVE-27K, obtaining a 97.00% micro F1-score, which could validate that the information included on WAVE-27-K has quality sufficient for training machine learning models.
Files
WAVE_JNIC.pdf
Files
(201.4 kB)
Name | Size | Download all |
---|---|---|
md5:6c3c9aced14deae2094e53dcf28f93d8
|
201.4 kB | Preview Download |
Additional details
Dates
- Accepted
-
2024-05-29