Published March 10, 2025 | Version v1
Dataset Open

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

Description

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification

Overview

This dataset accompanies the paper:

"Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats"

Published in TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing, colocated with NAACL 2025.

This dataset is designed to facilitate research in:

  • Text-to-SQL: Generating SQL queries from natural language.
  • IoT Network Traffic Analysis: Detecting malicious activity in IoT environments.
  • Multimodal Learning: Combining structured database queries with network security classification.

Dataset Contents

The dataset consists of three main components:

1. IoT Database (iot_database.sql.gz)

  • SQL schema and data from IoT-23 logs and Smart Building Sensor datasets to be put into a database.

2. Text-to-SQL Data (text-to-SQL-data.zip)

  • Includes queries with joins, aggregations, temporal conditions, and nested clauses.
  • Data split into training (6,591), validation (2,197), and test (2,197) sets.

3. Network Traffic Data (network_traffic_data.zip)

  • Each record labeled as benign or malicious.
  • Features include timestamps, IPs, ports, protocols, byte counts, and connection history.
  • Malicious traffic includes DDoS, C&C, and botnet-related activity.

Usage Instructions

Setting Up the Database

  1. Extract the database file:
    gunzip iot_database.sql.gz
    
  2. Import into MySQL:
    mysql -u <username> -p <database_name> < iot_database.sql
    
  3. Verify the schema:
    SHOW TABLES;
    

Citation

If you use this dataset, please cite:

@inproceedings{pavlich2025beyond,
  author = {Ryan Pavlich and Nima Ebadi and Richard Tarbell and Billy Linares and Adrian Tan and Rachael Humphreys and Jayanta Kumar Das and Rambod Ghandiparsi and Hannah Haley and Jerris George and Rocky Slavin and Kim-Kwang Raymond Choo and Glenn Dietrich and Anthony Rios},
  title = {Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats},
  booktitle = {TrustNLP: Fifth Workshop on Trustworthy Natural Language Processing},
  year = {2025},
  organization = {NAACL}
}

Contact

For questions or collaborations, contact Anthony Rios at Anthony.Rios@utsa.edu.

Files

network_traffic_data.zip

Files (315.0 MB)

Name Size Download all
md5:3e2a09c71693dfc4d818b2200fe8aee2
303.1 MB Download
md5:01947f71247fd92b7a965e5db30d13af
11.1 MB Preview Download
md5:e9592b05834aa41601fe27abd5a159a7
782.2 kB Preview Download