Analysing Technique Classification Performance of Cyber Threat Intelligence Extractors
Contributors
Supervisors:
Description
This thesis undertakes a comprehensive assessment of two state-of-the-art CTI
extraction programs in terms of technique classification. Inspired by the development
of open-source Large Language Models (LLMs), we also evaluated four of these
models in terms of classification performance, to test the potential for cross-validation
with CTI extractors. Our analysis points out that despite recent progress in technique
classification, the performance of CTI extractors remains low. With the high number
of false positives, open-source LLMs are not useful for this task either.
In addition, we introduce a novel pipeline for extracting attack techniques from CTI
reports. This pipeline involves report summarization using ChatGPT, followed by
loading the output into a specially retrained SciBERT model tailored to the
cybersecurity domain. Our approach yields promising results: when applied to 10 CTI
reports, we observe a substantial 7-percentage-point increase in the F1-score compared
to the original SciBERT model for processing entire reports. Furthermore, this
research outlines potential avenues for future exploration within the field, aimed at
further enhancing the precision of technique extractions.
Files
Cuong's Thesis (2).pdf
Files
(3.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e2211c2c81559f178add62001dbcb8dd
|
3.6 MB | Preview Download |
Additional details
Related works
- Is source of
- Conference paper: 10.1145/3701716.3715469 (DOI)
Dates
- Submitted
-
2023-11