Published September 29, 2025 | Version v3
Dataset Open

Data Collection & Requirements

Description

Open-Source Cybersecurity and AI Security Datasets

This project provides a comprehensive collection of open-source datasets focused on cybersecurity threats and AI security vulnerabilities. The datasets are carefully selected to align with specific security threats, such as:

  • Data Exfiltration

  • Data Poisoning

  • Model Manipulation

  • Adversarial Examples

  • Model Inversion

  • Model Extraction

  • Spoofing Attacks

  • Unauthorized Access

  • Supply Chain Compromise

Dataset Collection

Each dataset includes a detailed description, source type, purpose, and direct access links for easy retrieval.

Cybersecurity & AI Security Datasets

Comprehensive, Multi-Source Cyber-Security Events
Access Here: https://csr.lanl.gov/data/
Description: 58 days of de-identified LANL network data (authentication, process events, DNS, network flow, red team).
Format: Text files
Update Frequency: Static
Use Cases: Cybersecurity Event Analysis

DARPA Intrusion Detection Data Sets
Access Here: https://archive.ll.mit.edu/ideval/data/
Description: Simulated network traffic with intrusion scenarios.
Format: PCAP files
Update Frequency: Static
Use Cases: IDS Training

MITRE ATT&CK Framework Data
Access Here: https://attack.mitre.org/
Description: Adversary TTPs (tactics, techniques, procedures) in a globally accessible knowledge base.
Format: JSON/STIX
Update Frequency: Quarterly
Use Cases: Threat Intelligence

National Vulnerability Database (NVD)
Access Here: https://nvd.nist.gov/
Description: CVEs with severity scores and descriptions.
Format: XML/JSON
Update Frequency: Daily
Use Cases: Vulnerability Management

LANL Unified Host and Network Dataset
Access Here: https://csr.lanl.gov/data/
Description: Enterprise-scale dataset with network and host logs, including real-world red-team attacks.
Format: Text files
Update Frequency: Static
Use Cases: Insider Threat Detection

CIC-IDS2017 (Intrusion Detection Dataset)
Access Here: https://www.unb.ca/cic/datasets/ids-2017.html
Description: Network traffic dataset with multiple attack types (DDoS, brute-force, infiltration).
Format: PCAP, CSV
Update Frequency: Static
Use Cases: Intrusion Detection

CIC IoV CAN Bus Dataset 2024
Access Here: https://www.unb.ca/cic/datasets/
Description: Vehicle CAN bus data, including spoofing and DoS attack traces.
Format: CSV, PCAP
Update Frequency: Static
Use Cases: Automotive Security

ASVspoof 2019 (Voice Spoofing Dataset)
Access Here: https://datashare.ed.ac.uk/handle/10283/3336
Description: Evaluates automatic speaker verification systems under spoofing attacks.
Format: WAV files
Update Frequency: Static
Use Cases: Voice Security

ToN_IoT Datasets
Access Here: https://research.unsw.edu.au/projects/toniot-datasets
Description: Federated IoT data sources, including telemetry, OS logs, and network traffic.
Format: CSV, JSON
Update Frequency: Ongoing
Use Cases: Threat Intelligence

ADFA Intrusion Detection Datasets
Access Here: https://research.unsw.edu.au/projects/adfa-ids-datasets
Description: Host-based intrusion detection datasets for Windows and Linux.
Format: CSV, JSON
Update Frequency: Static
Use Cases: Host Intrusion Detection

Security Datasets Project
Access Here: https://github.com/OTRF/Security-Datasets
Description: A community-driven initiative sharing security datasets for research.
Format: JSON, CSV
Update Frequency: Ongoing
Use Cases: Threat Intelligence

CIC-BCCC-NRC Tabular IoT Attack Dataset (2024)
Access Here: https://www.yorku.ca/research/bccc/ucs-technical/cybersecurity-datasets-cds/
Description: A comprehensive IoT network attack dataset for AI-based cybersecurity research.
Format: CSV
Update Frequency: Ongoing
Use Cases: IoT Security

Awesome Cybersecurity Datasets
Access Here: https://github.com/shramos/Awesome-Cybersecurity-Datasets
Description: A curated list of publicly available datasets for cybersecurity research.
Format: Varies
Update Frequency: Multiple
Use Cases: General Cybersecurity Research

CSE-CIC-IDS2018
Access Here: https://www.unb.ca/cic/datasets/ids-2018.html
Description: Enterprise-scale dataset with 7 modern attack scenarios (Botnet, Brute Force, DDoS, Web attacks, Infiltration).
Format: PCAP, CSV
Update Frequency: Static
Use Cases: Intrusion Detection

UNSW-NB15
Access Here: https://research.unsw.edu.au/projects/unsw-nb15-dataset
Description: Realistic lab traffic with normal and malicious flows across 9 attack families.
Format: PCAP, CSV, Bro/Argus
Update Frequency: Static
Use Cases: IDS development, adversarial testing

Bot-IoT Dataset
Access Here: https://research.unsw.edu.au/projects/bot-iot-dataset
Description: IoT botnet traffic dataset generated in a realistic testbed.
Format: PCAP, Argus, CSV
Update Frequency: Static
Use Cases: IoT security, botnet detection

CTU-13 Botnet Dataset
Access Here: https://www.stratosphereips.org/datasets-ctu13
Description: 13 labeled scenarios of botnet, normal, and background traffic.
Format: PCAP, NetFlows, WebLogs
Update Frequency: Static
Use Cases: Botnet detection, anomaly detection

CERT Insider Threat Dataset (CMU/SEI)
Access Here: https://kilthub.cmu.edu/articles/dataset/Insider_Threat_Test_Dataset/12841247
Description: Synthetic enterprise insider-threat traces (background + malicious users).
Format: CSV, Logs
Update Frequency: Static
Use Cases: Insider Threat Detection

TrojAI Dataset (NIST)
Access Here: https://pages.nist.gov/trojai/
Description: Dataset of AI models with and without injected backdoors.
Format: ML Models + metadata
Update Frequency: By round
Use Cases: AI poisoning, supply-chain attack research

ImageNet-A
Access Here: https://www.tensorflow.org/datasets/catalog/imagenet_a
Description: 7,500 natural adversarial images that reliably fool ImageNet classifiers.
Format: JPEG
Update Frequency: Static
Use Cases: Adversarial robustness evaluation

FGI-SpoofRepo (GNSS Spoofing Repository)
Access Here: https://etsin.fairdata.fi/dataset/367379a8-7d78-4b08-91f0-8027ce7a621b
Description: GNSS I/Q recordings under spoofing scenarios (synchronous, async, meaconing).
Format: I/Q binary
Update Frequency: Static
Use Cases: GPS spoofing detection

TEXBAT (Texas Spoofing Test Battery)
Access Here: https://radionavlab.ae.utexas.edu/texbat/
Description: Canonical GPS spoofing datasets for evaluating resilience.
Format: RF recordings
Update Frequency: Static
Use Cases: GNSS spoofing detection

LiDAR Shadow Attack Dataset
Access Here: https://zenodo.org/records/15120571
Description: Experimental LiDAR point clouds with adversarial shadow perturbations.
Format: PCD
Update Frequency: Static
Use Cases: LiDAR adversarial attack evaluation

TartanAir Dataset
Access Here: https://theairlab.org/tartanair-dataset/
Description: Synthetic dataset for SLAM with stereo, depth, optical flow, segmentation, LiDAR.
Format: PNG, PCD, JSON
Update Frequency: Ongoing
Use Cases: SLAM robustness, AI sensor fusion

ANISENSE Datasets

CarlaScenes Dataset
Access Here: https://github.com/CarlaScenes/CarlaSence
Description: Synthetic dataset for odometry in autonomous driving using the CARLA simulator.
Format: PNG, PLY, BAG
Update Frequency: Ongoing
Use Cases: Autonomous Driving, SLAM, Odometry, AI Security

Realistic Vehicle Trajectories using CARLA
Access Here: https://ieee-dataport.org/documents/realistic-vehicle-trajectories-and-driving-parameters-carla-autonomous-driving-simulator
Description: Realistic vehicle trajectories in CARLA’s simulated urban environments.
Format: CSV, JSON
Update Frequency: Ongoing
Use Cases: Autonomous Driving, Traffic Simulation, AI Security

KITTI Vision Benchmark Suite
Access Here: http://www.cvlibs.net/datasets/kitti/
Description: Comprehensive dataset for training ML models in stereo, odometry, 3D detection, and segmentation.
Format: PNG, BIN, TXT
Update Frequency: Ongoing
Use Cases: Autonomous Driving, 3D Object Detection, Semantic Segmentation

OUSTER Dataset
Access Here: https://ouster.com/downloads/sample-lidar-data
Description: LiDAR data captured by Ouster sensors for super-resolution and 3D mapping.
Format: Point Clouds
Update Frequency: Ongoing
Use Cases: LiDAR Super-Resolution, 3D Mapping, AI Security

SemanticPOSS
Access Here: http://www.poss.pku.edu.cn/
Description: LiDAR dataset with instance-level annotations for dynamic objects (pedestrians, riders, vehicles).
Format: PCD, PNG
Update Frequency: Static
Use Cases: 3D Semantic Segmentation, Autonomous Driving, AI Security

Siemens Dataset

IPIN23 Indoor Positioning Challenge Dataset
Access Here: https://ipin-conference.org/2023/competition/
Description: Multi-sensor dataset (WiFi, BLE, IMU, magnetometer) for indoor positioning and navigation.
Format: CSV, Sensor logs
Update Frequency: Static
Use Cases: Indoor localization, positioning security

Files

Files (20.1 kB)

Name Size Download all
md5:5ac6bf93ab6743b6687ed5eb295fbd27
20.1 kB Download