Published March 10, 2025
| Version v1
Dataset
Open
IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification
Description
IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification
Overview
This dataset accompanies the paper:
"Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats"
Published in TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing, colocated with NAACL 2025.
This dataset is designed to facilitate research in:
- Text-to-SQL: Generating SQL queries from natural language.
- IoT Network Traffic Analysis: Detecting malicious activity in IoT environments.
- Multimodal Learning: Combining structured database queries with network security classification.
Dataset Contents
The dataset consists of three main components:
1. IoT Database (iot_database.sql.gz)
- SQL schema and data from IoT-23 logs and Smart Building Sensor datasets to be put into a database.
2. Text-to-SQL Data (text-to-SQL-data.zip)
- Includes queries with joins, aggregations, temporal conditions, and nested clauses.
- Data split into training (6,591), validation (2,197), and test (2,197) sets.
3. Network Traffic Data (network_traffic_data.zip)
- Each record labeled as benign or malicious.
- Features include timestamps, IPs, ports, protocols, byte counts, and connection history.
- Malicious traffic includes DDoS, C&C, and botnet-related activity.
Usage Instructions
Setting Up the Database
- Extract the database file:
gunzip iot_database.sql.gz - Import into MySQL:
mysql -u <username> -p <database_name> < iot_database.sql - Verify the schema:
SHOW TABLES;
Citation
If you use this dataset, please cite:
@inproceedings{pavlich2025beyond,
author = {Ryan Pavlich and Nima Ebadi and Richard Tarbell and Billy Linares and Adrian Tan and Rachael Humphreys and Jayanta Kumar Das and Rambod Ghandiparsi and Hannah Haley and Jerris George and Rocky Slavin and Kim-Kwang Raymond Choo and Glenn Dietrich and Anthony Rios},
title = {Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats},
booktitle = {TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing},
year = {2025},
organization = {NAACL}
}
Contact
For questions or collaborations, contact Anthony Rios at Anthony.Rios@utsa.edu.