Published February 14, 2024 | Version v1
Dataset Open

Dataset for A Robust Cybersecurity Topic Classification Tool

  • 1. New Mexico Cybersecurity Center of Excellence

Description

Dataset for the paper titled A Robust Cybersecurity Topic Classification Tool

Paper is available at: https://arxiv.org/abs/2109.02473

The dataset is comprised of text file lists that are labeled as either cybersecurity related, or not cybersecurity related. The sources are from Reddit, Stackexchange sites, and arXiv documents that were parsed to text files. 

Note that this dataset was scraped from public APIs, and is not post-processed, filtered, or censored in any manner. 

Also note that these datasets were not manually labeled, so there are very likely incorrect labels. The labeling was performed using community and author defined tags and metadata. 

Additional data is located at https://github.com/epelofske-student/CTC

Files

Files (3.1 GB)

Name Size Download all
md5:b2440fe50af1c3694fd6d3b980acb139
3.1 GB Download