Published June 21, 2024
| Version v1.0
Computational notebook
Restricted
Topics API Analysis
Authors/Creators
Description
This repository provides the experimental results of the paper The Privacy-Utility Trade-off in the Topics API.
Usage
The notebooks were run using:
- Python v3.11.8
- bvmlib v1.0.0
- matplotlib 3.8.0
- numpy 1.24.3
- pandas 2.0.1
- qif 1.2.3
- requests 2.31.0
- scipy 1.11.3
- tldextract 5.1.2
- tqdm 4.66.1
- urllib3 1.26.16
The datasets produced for the experiments can be found on Zenodo: AOL Dataset for Browsing History and Topics of Interest (DOI: 10.5281/zenodo.11029572).
Notebooks
- Data treatment:
AOL-data-treatment.ipynb:- Converts the original AOL dataset.
- Treats inconsistencies; Randomly remaps
AnonIDtoRandID; Defines domains from URLs; and Filters domains by eTLD usingtldextractand Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an. - Generates the datasets
AOL-treated.csvandAOL-treated-unique-domains.csv.- The dataset
AOL-treated.csvcan be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies. - This dataset contains singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months) that are dropped in some analyses.
- The dataset
Citizen-Lab-Classification-data-treatment.ipynb:- Converts the Citizen Lab Classification data, as of commit ebd0ee8.
- Treats inconsistencies; Defines domains from URLs; Filters domains by eTLD using
tldextractand Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an; and Merges classifications by domain. - Generates the dataset
Citizen-Lab-Classification.csv.
AOL-treated-Citizen-Lab-Classification-domain-matching.ipynb:- Matches domains from
AOL-treated-unique-domains.csvwith domains and respective topics fromCitizen-Lab-Classification.csv. - Generates the dataset
AOL-treated-Citizen-Lab-Classification-domain-match.csv.
- Matches domains from
AOL-treated-Google-Topics-Classification-v1-domain-matching.ipynb:- Matches domains from
AOL-treated-unique-domains.csvwith domains and respective topics fromGoogle-Topics-Classification-v1.txt, as provided by Google with the Chrome browser. - Generates the dataset
AOL-treated-Google-Topics-Classification-v1-domain-match.csv.
- Matches domains from
AOL-reduced-Citizen-Lab-Classification.ipynb:- Converts the dataset
AOL-treated.csv. - Reduces the dataset
AOL-treated.csvaccording to the datasetAOL-treated-Citizen-Lab-Classification-domain-match.csv. - Generates the dataset
AOL-reduced-Citizen-Lab-Classification.csv.- The dataset
AOL-reduced-Citizen-Lab-Classification.csvcan be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. - This dataset contains singletons and the outlier that are dropped in some analyses.
- This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics.
- Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
- The dataset
- Converts the dataset
AOL-reduced-Google-Topics-Classification-v1.ipynb:- Converts the dataset
AOL-treated.csv. - Reduces the dataset
AOL-treated.csvaccording to the datasetAOL-treated-Google-Topics-Classification-v1-domain-match.csv. - Generates the dataset
AOL-reduced-Google-Topics-Classification-v1.csv.- The dataset
AOL-reduced-Google-Topics-Classification-v1.csvcan be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. - This dataset contains singletons and the outlier that are dropped in some analyses.
- This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics.
- Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
- The dataset
- Converts the dataset
AOL-experimental.ipynb:- Converts the dataset
AOL-treated.csv. - Drops singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months); and Defines browsing histories.
- Generates the dataset
AOL-experimental.csv.- The dataset
AOL-experimental.csvcan be used to empirically verify code correctness. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- The dataset
- Converts the dataset
AOL-experimental-Citizen-Lab-Classification.ipynb:- Converts the dataset
AOL-reduced-Citizen-Lab-Classification.csv. - Generates the dataset
AOL-experimental-Citizen-Lab-Classification.csv.- The dataset
AOL-experimental-Citizen-Lab-Classification.csvcan be used to empirically verify code correctness. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- The dataset
- Converts the dataset
AOL-experimental-Google-Topics-Classification-v1.ipynb:- Converts the dataset
AOL-reduced-Google-Topics-Classification-v1.csv. - Generates the dataset
AOL-experimental-Google-Topics-Classification-v1.csv.- The dataset
AOL-experimental-Google-Topics-Classification-v1.csvcan be used to empirically verify code correctness. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- The dataset
- Converts the dataset
- Analyses:
QIF-analyses-AOL-treated.ipynb:- QIF analyses based on the dataset
AOL-treated.csv. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- QIF analyses based on the dataset
QIF-analyses-AOL-reduced-Citizen-Lab.ipynb:- QIF analyses based on the dataset
AOL-reduced-Citizen-Lab-Classification.csv. - Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
- QIF analyses based on the dataset
QIF-analyses-AOL-reduced-Google-Topics-v1.ipynb:- QIF analyses based on the dataset
AOL-reduced-Google-Topics-Classification-v1.csv. - Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
- QIF analyses based on the dataset
QIF-analyses-counting-experiment.ipynb:- QIF analysis for counting topics popularity using the binomial distribution.
QIF-analyses-AOL-experimental.ipynb:- QIF analyses based on the dataset
AOL-experimental.csv. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- QIF analyses based on the dataset
QIF-analyses-AOL-experimental-Citizen-Lab.ipynb:- QIF analyses based on the dataset
AOL-experimental-Citizen-Lab-Classification.csv. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- QIF analyses based on the dataset
QIF-analyses-AOL-experimental-Google-Topics-v1.ipynb:- QIF analyses based on the dataset
AOL-experimental-Google-Topics-Classification-v1.csv. - All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
- QIF analyses based on the dataset
License
To understand how the various GNU licenses are compatible with each other, please refer to the GNU licenses FAQ.
Files
Additional details
Related works
- Has part
- Dataset: 10.5281/zenodo.11029572 (DOI)
- Is derived from
- Computational notebook: https://github.com/nunesgh/topics-api-analysis (URL)
Software
- Repository URL
- https://github.com/nunesgh/topics-api-analysis
- Programming language
- Python