Published May 29, 2024 | Version v1
Conference paper Open

Automating Cybersecurity TTP Classification Based on Unstructured Attack Descriptions

Description

CTI sources help SOCs to share important information about incidents and attacks. Unstructured text processing gains importance, considering that incident-related information is present in a wide range of sources. The datasets in the literature contain insufficiently lengthy text or a limited number of samples per class. Therefore, we proposed a method to build a semi-automatic dataset using the CTI sources. As a result, we have presented a new dataset of unstructured CTI descriptions called Weakness, Attack, Vulnerabilities, and Events 27k (WAVE-27K). WAVE-27K includes information on 27 different MITRE techniques and 7 tactics, containing 22539 samples associated with a single technique and 5262 samples related to two or more techniques. WAVE-27K is the largest dataset compared to those in the literature. We trained a BERT-based model using WAVE-27K, obtaining a 97.00% micro F1-score, which could validate that the information included on WAVE-27-K has quality sufficient for training machine learning models.

Files

WAVE_JNIC.pdf

Files (201.4 kB)

Name Size Download all
md5:6c3c9aced14deae2094e53dcf28f93d8
201.4 kB Preview Download

Additional details

Dates

Accepted
2024-05-29