TCAB: Text Classification Attack Benchmark Dataset

doi:10.5281/zenodo.7226519

Published October 19, 2022 | Version 0.0.2

Dataset Open

TCAB: Text Classification Attack Benchmark Dataset

1. University of California Irvine
2. University of California San Diego
3. University of Oregon

TCAB is a large collection of successful adversarial attacks on state-of-the-art text classification models trained on multiple sentiment and abuse domain datasets.

The dataset is broken up into 2 files: train.csv and val.csv. The training set contains 1,448,751 instances (552,364 are "clean" unperturbed instances) and the validation set contains 482,914 instances (178,607 are "clean"). Each instance contains the following attributes:

scenario: Domain, either abuse or sentiment.

target_model_dataset: Dataset being attacked.

target_model_train_dataset: Dataset the target model trained on.

target_model: Type of victim model (e.g., bert, roberta, xlnet).

attack_toolchain: Open-source attack toolchain, either TextAttack or OpenAttack.

attack_name: Name of the attack method.

original_text: Original input text.

original_output: Prediction probabilities of the target model on the original text.

ground_truth: Encoded label for the original task of the domain dataset. 1 and 0 means toxic and toxic for abuse datasets, respectively. 1 and 0 means positive and negative sentiment for sentiment datasets. If there is a neutral sentiment, then 2, 1, 0 means positive, neutral, and negative sentiment.

status: Unperturbed example if "clean"; successful adversarial attack if "success".

perturbed_text: Text after it has been perturbed by an attack.

perturbed_output: Prediction probabilities of the target model on the perturbed text.

attack_time: Time taken to execute the attack.

num_queries: Number of queries performed while attacking.

frac_words_changed: Fraction of words changed due to an attack.

test_index: Index of each unique source example (original instance) (LEGACY - necessary for backwards compatibility).

original_text_identifier: Index of each unique source example (original instance).

unique_src_instance_identifier: Primary key to uniquely identify to every source instance; comprised of (target_model_dataset, test_index, original_text_identifier).

pk: Primary key to uniquely identify every attack instance; comprised of (attack_name, attack_toolchain, original_text_identifier, scenario, target_model, target_model_dataset, test_index).

Files

train.csv

Files (1.8 GB)

Name	Size	Download all
train.csv md5:cfd575a61fe4e764962b560d2dc2ce15	1.3 GB	Preview Download
val.csv md5:245e155ec16a7f4dc535f15561464099	448.2 MB	Preview Download

	All versions	This version
Views	936	409
Downloads	542	141
Data volume	650.8 GB	208.1 GB

TCAB: Text Classification Attack Benchmark Dataset

Creators

Description

Files

train.csv

Files (1.8 GB)