TCAB: Text Classification Attack Benchmark Dataset

Asthana, Kalyani; Xie, Zhouhang; You, Wencong; Noack, Adam; Brophy, Jonathan; Singh, Sameer; Lowd, Daniel

doi:10.5281/zenodo.6615386

Published June 7, 2022 | Version 0.0.1

Dataset Open

TCAB: Text Classification Attack Benchmark Dataset

1. University of California Irvine
2. University of California San Diego
3. University of Oregon

TCAB is a large collection of successful adversarial attacks on state-of-the-art text classification models trained on multiple sentiment and abuse domain datasets.

The dataset is broken up into 2 files: train.csv, and val.csv. The training set contains 1,448,751 instances (552,364 are "clean" unperturbed instances), while the validation set contains 482,914 instances (178,607 are "clean"). Each instance in both files have the following attributes:

scenario: Domain, either abuse or sentiment.

target_model_dataset: Dataset being attacked.

target_model_train_dataset: Dataset the target model trained on.

target_model: Type of victim model (e.g., bert, roberta, xlnet).

attack_toolchain: Open-source attack toolchain, either TextAttack or OpenAttack.

attack_name: Name of the attack method.

original_text: Original input text.

original_output: Prediction probabilities of the target model on the original text.

ground_truth: Encoded label for the original task of the domain dataset. 1 and 0 means toxic and toxic for abuse datasets, respectively. 1 and 0 means positive and negative sentiment for sentiment datasets. If there is a neutral sentiment, then 2, 1, 0 means positive, neutral, and negative sentiment.

status: Unperturbed example if "clean"; successful adversarial attack if "success".

perturbed_text: Text after it has been perturbed by an attack.

perturbed_output: Prediction probabilities of the target model on the perturbed text.

attack_time: Time taken to execute the attack.

num_queries: Number of queries performed while attacking.

frac_words_changed: Fraction of words changed due to an attack.

test_index: Index of each unique source example (original instance) (LEGACY - necessary for backwards compatibility).

original_text_identifier: Index of each unique source example (original instance).

unique_src_instance_identifier: Primary key to uniquely identify to every source instance; comprised of (target_model_dataset, test_index, original_text_identifier).

pk: Primary key to uniquely identify every attack instance; comprised of (attack_name, attack_toolchain, original_text_identifier, scenario, target_model, target_model_dataset, test_index).

Files

train.csv

Files (1.8 GB)

Name	Size	Download all
train.csv md5:cfd575a61fe4e764962b560d2dc2ce15	1.3 GB	Preview Download
val.csv md5:245e155ec16a7f4dc535f15561464099	448.2 MB	Preview Download

	All versions	This version
Views	1,106	589
Downloads	869	556
Data volume	978.6 GB	595.2 GB

TCAB: Text Classification Attack Benchmark Dataset

Creators

Description

Files

train.csv

Files (1.8 GB)