TCAB: Text Classification Attack Benchmark Dataset
Creators
- 1. University of California Irvine
- 2. University of California San Diego
- 3. University of Oregon
Description
TCAB is a large collection of successful adversarial attacks on state-of-the-art text classification models trained on multiple sentiment and abuse domain datasets.
The dataset is broken up into 2 files: train.csv, and val.csv. The training set contains 1,448,751 instances (552,364 are "clean" unperturbed instances), while the validation set contains 482,914 instances (178,607 are "clean"). Each instance in both files have the following attributes:
scenario: Domain, either abuse or sentiment.
target_model_dataset: Dataset being attacked.
target_model_train_dataset: Dataset the target model trained on.
target_model: Type of victim model (e.g., bert, roberta, xlnet).
attack_toolchain: Open-source attack toolchain, either TextAttack or OpenAttack.
attack_name: Name of the attack method.
original_text: Original input text.
original_output: Prediction probabilities of the target model on the original text.
ground_truth: Encoded label for the original task of the domain dataset. 1 and 0 means toxic and toxic for abuse datasets, respectively. 1 and 0 means positive and negative sentiment for sentiment datasets. If there is a neutral sentiment, then 2, 1, 0 means positive, neutral, and negative sentiment.
status: Unperturbed example if "clean"; successful adversarial attack if "success".
perturbed_text: Text after it has been perturbed by an attack.
perturbed_output: Prediction probabilities of the target model on the perturbed text.
attack_time: Time taken to execute the attack.
num_queries: Number of queries performed while attacking.
frac_words_changed: Fraction of words changed due to an attack.
test_index: Index of each unique source example (original instance) (LEGACY - necessary for backwards compatibility).
original_text_identifier: Index of each unique source example (original instance).
unique_src_instance_identifier: Primary key to uniquely identify to every source instance; comprised of (target_model_dataset, test_index, original_text_identifier).
pk: Primary key to uniquely identify every attack instance; comprised of (attack_name, attack_toolchain, original_text_identifier, scenario, target_model, target_model_dataset, test_index).