Dataset for A Robust Cybersecurity Topic Classification Tool
Description
Dataset for the paper titled A Robust Cybersecurity Topic Classification Tool
Paper is available at: https://arxiv.org/abs/2109.02473
The dataset is comprised of text file lists that are labeled as either cybersecurity related, or not cybersecurity related. The sources are from Reddit, Stackexchange sites, and arXiv documents that were parsed to text files.
Note that this dataset was scraped from public APIs, and is not post-processed, filtered, or censored in any manner.
Also note that these datasets were not manually labeled, so there are very likely incorrect labels. The labeling was performed using community and author defined tags and metadata.
Additional data is located at https://github.com/epelofske-student/CTC
Files
Files
(3.1 GB)
Name | Size | Download all |
---|---|---|
md5:b2440fe50af1c3694fd6d3b980acb139
|
3.1 GB | Download |