Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses
Description
The accompanying artifacts provide a comprehensive framework for understanding and replicating the experiments described in our paper, Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses. These artifacts are designed to support open science practices, ensuring reproducibility and transparency in research. They include datasets, source code, and Jupyter notebooks used to implement and evaluate the proposed attacks and defenses. The datasets encompass both publicly available and partitioned subsets to replicate training and testing scenarios. The codebase is modular, with dedicated scripts for data preprocessing, model setup, experiment configuration, and attack implementation. Additionally, the notebooks encapsulate the step-by-step execution of the experiments, enabling users to validate key results such as the relationship between correlation and attack performance, the efficacy of disparity inference and targeted inference attacks, and the impact of proposed defenses like BCorr. These resources are shared in compliance with the conference’s open science policy and hosted on a platform ensuring permanent access.
Environment Setup:
To set up the environment, create a virtual environment using Conda or Python’s venv, then activate it. Install dependencies with pip install -r requirements.txt. To run Jupyter notebooks, install notebook using pip install notebook. Once done, the environment is ready for running the provided scripts and notebooks.
Datasets:
census19.csvcontains the Census-19 dataset.Adult_35222.csvandAdult_10000.csvcontains the Adult dataset partitions for training and testing respectively.-
Instruction for downloading and preprocessing the Texas-100X dataset:
- Download the
PUDF_base1q2006_tab.txt,PUDF_base2q2006_tab.txt,PUDF_base3q2006_tab.txtandPUDF_base4q2006_tab.txtfiles from https://www.dshs.texas.gov/center-health-statistics/texas-health-care-information-collection/health-data-researcher-information/texas-hospital-emergency-department-research-data-file-ed-rdf/hospital-discharge-data-public-use-data-file and place them inside thedataset/texas_100_v2folder. - Run the following command to preprocess the dataset:
python preprocess_dataset.py texas_100_v2 --preprocess 1 - After that, use the
texas_100_v2.csvfile for running the experiments.
- Download the
Codebase:
Source Files:
data_utils.pycontains code to load dataset and preprocess data. It also contains code to sample data matching target correlation at a subgroup level as described in section 6.1 paragraph 'Sampling Technique'.model_utils.pycontains code to define the model architecture and training hyperparameters.experiment_utils.pycontains code to setup experiment for a particular scenario.attack_utils.pycontains the implementation of existing attacks including CSMIA, LOMIA, imputation attack, and Neuron Importance Attack.whitebox_attack.pycontains helper functions needed to perform the neuron importance attack.disparity_inference_utils.pycontains the implementation of Confidence Matrix generation (Algorithm 1), Angular Difference computation (Algorithm 2).targeted_inference.pycontains the implementation of the targeted attribute inference attacks (section 5.3).bcorr_utils.pycontains the implementation of the sampling stage of the BCorr defense (section 7.2).
Notebooks:
correlation_vs_attack_performance.ipynbcontains the code to run the experiment described in section 4.1 which shows the strong connection between correlation and attack performance.angular_difference_by_sex.ipynbcontains the code to run the experiment described in section 4.2 which shows how the angular difference can be used to identify vulnerable groups.imputation_vs_ai_aux_size_and_distrib_diff.ipynbcontains the code to run the experiment described in section 6.2, which compares the performance between ideal imputation and practical imputation attacks.disparity_inference.ipynbcontains the code to run the experiment described in section 6.3 which shows how the disparity inference attack can be used to rank groups based on their vulnerability.targeted_attribute_inference.ipynbcontains the code to run the experiments in section 6.4 which shows how the targeted attribute inference attack outperforms their untargeted counterparts and practical imputation attacks.mutual_info_reg.ipynbcontains the code to run the experiments in section 7.1 which shows the ineffectiveness of existing defenses in mitigating disparate vulnerability.bcorr_defense.ipynbcontains the code to run the experiments in section 7.2 which shows how the BCorr defense can be used to mitigate the vulnerability of the model to the targeted attribute inference attack.
Files
angular_difference_by_sex.ipynb
Files
(72.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d479f94becded4798e44c6b16bf45647
|
688.5 kB | Preview Download |
|
md5:c72ba293b6e73ce714b824711a377310
|
2.4 MB | Preview Download |
|
md5:d9635caf3b1d5ea447d7c3bcb47e244c
|
782.1 kB | Preview Download |
|
md5:cc6aabbb6ab137487846472427d582d0
|
28.7 kB | Download |
|
md5:27049d49c5b689433b967231b8da9f89
|
13.0 kB | Preview Download |
|
md5:0ca80addf5bbaedb8685120fdcc6670e
|
5.2 kB | Download |
|
md5:ffff2525fb0ee34ffa6f723475a7411b
|
67.9 MB | Preview Download |
|
md5:2872b4353a525b7ea68660a2b489ebf0
|
278.3 kB | Preview Download |
|
md5:f059c22a02951ecc56c782c7defcef03
|
53.1 kB | Download |
|
md5:f3b20ee8b72ea601aa474e157f689ad9
|
17.0 kB | Download |
|
md5:e7cb5b86d1d02aefc94e3391641b2104
|
291.3 kB | Preview Download |
|
md5:ecdbc35bd816369f88734e600c09db1b
|
10.6 kB | Download |
|
md5:02c5680a66fcd8e3b3282f3d08b45363
|
5.0 kB | Download |
|
md5:9b5ab5d7c7cbb4c07f358b9cbdc474bb
|
133.2 kB | Preview Download |
|
md5:a0e25bf84041581150ddbd9c3b3fe121
|
25.8 kB | Download |
|
md5:8a2e703deb22721a00e79bd03cef3f0a
|
21.3 kB | Preview Download |
|
md5:57ce1aa54bf44a9a779108deb8864141
|
28.2 kB | Download |
|
md5:61ad7a4f7624d24d93238d8de00b7c3c
|
84 Bytes | Preview Download |
|
md5:883a5ff2ef7c193545bef850a9eb8af0
|
19.1 kB | Preview Download |
|
md5:9745c42fd4a16583ffc2db768f92dc52
|
13.9 kB | Download |
|
md5:c2047d14d4e2a6f81d0b650bd088402f
|
4.8 kB | Download |