Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses

Kabir, Ehsanul; Craig, Lucas; Mehnaz, Shagufta

doi:10.5281/zenodo.14732956

Published January 24, 2025 | Version v1

Conference paper Open

Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses

1. Pennsylvania State University

The accompanying artifacts provide a comprehensive framework for understanding and replicating the experiments described in our paper, Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses. These artifacts are designed to support open science practices, ensuring reproducibility and transparency in research. They include datasets, source code, and Jupyter notebooks used to implement and evaluate the proposed attacks and defenses. The datasets encompass both publicly available and partitioned subsets to replicate training and testing scenarios. The codebase is modular, with dedicated scripts for data preprocessing, model setup, experiment configuration, and attack implementation. Additionally, the notebooks encapsulate the step-by-step execution of the experiments, enabling users to validate key results such as the relationship between correlation and attack performance, the efficacy of disparity inference and targeted inference attacks, and the impact of proposed defenses like BCorr. These resources are shared in compliance with the conference’s open science policy and hosted on a platform ensuring permanent access.

Environment Setup:

To set up the environment, create a virtual environment using Conda or Python’s venv, then activate it. Install dependencies with pip install -r requirements.txt. To run Jupyter notebooks, install notebook using pip install notebook. Once done, the environment is ready for running the provided scripts and notebooks.

Datasets:

census19.csv contains the Census-19 dataset.
Adult_35222.csv and Adult_10000.csv contains the Adult dataset partitions for training and testing respectively.
Instruction for downloading and preprocessing the Texas-100X dataset:
1. Download the PUDF_base1q2006_tab.txt, PUDF_base2q2006_tab.txt, PUDF_base3q2006_tab.txt and PUDF_base4q2006_tab.txt files from https://www.dshs.texas.gov/center-health-statistics/texas-health-care-information-collection/health-data-researcher-information/texas-hospital-emergency-department-research-data-file-ed-rdf/hospital-discharge-data-public-use-data-file and place them inside the dataset/texas_100_v2 folder.
2. Run the following command to preprocess the dataset:
  python preprocess_dataset.py texas_100_v2 --preprocess 1
3. After that, use the texas_100_v2.csv file for running the experiments.

Codebase:

Source Files:

data_utils.py contains code to load dataset and preprocess data. It also contains code to sample data matching target correlation at a subgroup level as described in section 6.1 paragraph 'Sampling Technique'.
model_utils.py contains code to define the model architecture and training hyperparameters.
experiment_utils.py contains code to setup experiment for a particular scenario.
attack_utils.py contains the implementation of existing attacks including CSMIA, LOMIA, imputation attack, and Neuron Importance Attack.
whitebox_attack.py contains helper functions needed to perform the neuron importance attack.
disparity_inference_utils.py contains the implementation of Confidence Matrix generation (Algorithm 1), Angular Difference computation (Algorithm 2).
targeted_inference.py contains the implementation of the targeted attribute inference attacks (section 5.3).
bcorr_utils.py contains the implementation of the sampling stage of the BCorr defense (section 7.2).

Notebooks:

correlation_vs_attack_performance.ipynb contains the code to run the experiment described in section 4.1 which shows the strong connection between correlation and attack performance.
angular_difference_by_sex.ipynb contains the code to run the experiment described in section 4.2 which shows how the angular difference can be used to identify vulnerable groups.
imputation_vs_ai_aux_size_and_distrib_diff.ipynb contains the code to run the experiment described in section 6.2, which compares the performance between ideal imputation and practical imputation attacks.
disparity_inference.ipynb contains the code to run the experiment described in section 6.3 which shows how the disparity inference attack can be used to rank groups based on their vulnerability.
targeted_attribute_inference.ipynb contains the code to run the experiments in section 6.4 which shows how the targeted attribute inference attack outperforms their untargeted counterparts and practical imputation attacks.
mutual_info_reg.ipynb contains the code to run the experiments in section 7.1 which shows the ineffectiveness of existing defenses in mitigating disparate vulnerability.
bcorr_defense.ipynb contains the code to run the experiments in section 7.2 which shows how the BCorr defense can be used to mitigate the vulnerability of the model to the targeted attribute inference attack.

Files

angular_difference_by_sex.ipynb

Files (72.7 MB)

Name	Size	Download all
Adult_10000.csv md5:d479f94becded4798e44c6b16bf45647	688.5 kB	Preview Download
Adult_35222.csv md5:c72ba293b6e73ce714b824711a377310	2.4 MB	Preview Download
angular_difference_by_sex.ipynb md5:d9635caf3b1d5ea447d7c3bcb47e244c	782.1 kB	Preview Download
attack_utils.py md5:cc6aabbb6ab137487846472427d582d0	28.7 kB	Download
balancing_corr_defense.ipynb md5:27049d49c5b689433b967231b8da9f89	13.0 kB	Preview Download
bcorr_utils.py md5:0ca80addf5bbaedb8685120fdcc6670e	5.2 kB	Download
census19.csv md5:ffff2525fb0ee34ffa6f723475a7411b	67.9 MB	Preview Download
correlation_vs_attack_performance.ipynb md5:2872b4353a525b7ea68660a2b489ebf0	278.3 kB	Preview Download
data_utils.py md5:f059c22a02951ecc56c782c7defcef03	53.1 kB	Download
disparate_vulnerability_utils.py md5:f3b20ee8b72ea601aa474e157f689ad9	17.0 kB	Download
disparity_inference.ipynb md5:e7cb5b86d1d02aefc94e3391641b2104	291.3 kB	Preview Download
disparity_inference_utils.py md5:ecdbc35bd816369f88734e600c09db1b	10.6 kB	Download
experiment_utils.py md5:02c5680a66fcd8e3b3282f3d08b45363	5.0 kB	Download
imputation_vs_ai_aux_size_and_distrib_diff.ipynb md5:9b5ab5d7c7cbb4c07f358b9cbdc474bb	133.2 kB	Preview Download
model_utils.py md5:a0e25bf84041581150ddbd9c3b3fe121	25.8 kB	Download
mutual_info_reg.ipynb md5:8a2e703deb22721a00e79bd03cef3f0a	21.3 kB	Preview Download
preprocess_dataset.py md5:57ce1aa54bf44a9a779108deb8864141	28.2 kB	Download
requirements.txt md5:61ad7a4f7624d24d93238d8de00b7c3c	84 Bytes	Preview Download
targeted_attribute_inference.ipynb md5:883a5ff2ef7c193545bef850a9eb8af0	19.1 kB	Preview Download
targeted_inference.py md5:9745c42fd4a16583ffc2db768f92dc52	13.9 kB	Download
whitebox_attack.py md5:c2047d14d4e2a6f81d0b650bd088402f	4.8 kB	Download

	All versions	This version
Views	198	149
Downloads	1,331	697
Data volume	5.2 GB	4.0 GB

Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses

Authors/Creators

Description

Environment Setup:

Datasets:

Codebase:

Files

angular_difference_by_sex.ipynb

Files (72.7 MB)