Published October 19, 2022 | Version 0.0.2
Dataset Open

TCAB: Text Classification Attack Benchmark Dataset

  • 1. University of California Irvine
  • 2. University of California San Diego
  • 3. University of Oregon

Description

TCAB is a large collection of successful adversarial attacks on state-of-the-art text classification models trained on multiple sentiment and abuse domain datasets.

The dataset is broken up into 2 files: train.csv and val.csv. The training set contains 1,448,751 instances (552,364 are "clean" unperturbed instances) and the validation set contains 482,914 instances (178,607 are "clean"). Each instance contains the following attributes:

scenario: Domain, either abuse or sentiment.

target_model_dataset: Dataset being attacked.

target_model_train_dataset: Dataset the target model trained on.

target_model: Type of victim model (e.g., bertrobertaxlnet).

attack_toolchain: Open-source attack toolchain, either TextAttack or OpenAttack.

attack_name: Name of the attack method.

original_text: Original input text.

original_output: Prediction probabilities of the target model on the original text.

ground_truth: Encoded label for the original task of the domain dataset. 1 and 0 means toxic and toxic for abuse datasets, respectively. 1 and 0 means positive and negative sentiment for sentiment datasets. If there is a neutral sentiment, then 2, 1, 0 means positive, neutral, and negative sentiment.

status: Unperturbed example if "clean"; successful adversarial attack if "success".

perturbed_text: Text after it has been perturbed by an attack.

perturbed_output: Prediction probabilities of the target model on the perturbed text.

attack_time: Time taken to execute the attack.

num_queries: Number of queries performed while attacking.

frac_words_changed: Fraction of words changed due to an attack.

test_index: Index of each unique source example (original instance) (LEGACY - necessary for backwards compatibility).

original_text_identifier: Index of each unique source example (original instance).

unique_src_instance_identifier: Primary key to uniquely identify to every source instance; comprised of (target_model_datasettest_indexoriginal_text_identifier).

pk: Primary key to uniquely identify every attack instance; comprised of (attack_nameattack_toolchainoriginal_text_identifierscenariotarget_modeltarget_model_datasettest_index).

Files

train.csv

Files (1.8 GB)

Name Size Download all
md5:cfd575a61fe4e764962b560d2dc2ce15
1.3 GB Preview Download
md5:245e155ec16a7f4dc535f15561464099
448.2 MB Preview Download