There is a newer version of the record available.

Published June 21, 2024 | Version v1.0
Computational notebook Restricted

Topics API Analysis

  • 1. ROR icon Macquarie University
  • 2. ROR icon Universidade Federal de Minas Gerais

Description

Topics API Analysis

This repository provides the experimental results of the paper The Privacy-Utility Trade-off in the Topics API.

Usage

The notebooks were run using:

  • Python v3.11.8
  • bvmlib v1.0.0
  • matplotlib 3.8.0
  • numpy 1.24.3
  • pandas 2.0.1
  • qif 1.2.3
  • requests 2.31.0
  • scipy 1.11.3
  • tldextract 5.1.2
  • tqdm 4.66.1
  • urllib3 1.26.16

The datasets produced for the experiments can be found on Zenodo: AOL Dataset for Browsing History and Topics of Interest (DOI: 10.5281/zenodo.11029572).

Notebooks

  • Data treatment:
    • AOL-data-treatment.ipynb:
      • Converts the original AOL dataset.
      • Treats inconsistencies; Randomly remaps AnonID to RandID; Defines domains from URLs; and Filters domains by eTLD using tldextract and Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an.
      • Generates the datasets AOL-treated.csv and AOL-treated-unique-domains.csv.
        • The dataset AOL-treated.csv can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies.
        • This dataset contains singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months) that are dropped in some analyses.
    • Citizen-Lab-Classification-data-treatment.ipynb:
      • Converts the Citizen Lab Classification data, as of commit ebd0ee8.
      • Treats inconsistencies; Defines domains from URLs; Filters domains by eTLD using tldextract and Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an; and Merges classifications by domain.
      • Generates the dataset Citizen-Lab-Classification.csv.
    • AOL-treated-Citizen-Lab-Classification-domain-matching.ipynb:
      • Matches domains from AOL-treated-unique-domains.csv with domains and respective topics from Citizen-Lab-Classification.csv.
      • Generates the dataset AOL-treated-Citizen-Lab-Classification-domain-match.csv.
    • AOL-treated-Google-Topics-Classification-v1-domain-matching.ipynb:
      • Matches domains from AOL-treated-unique-domains.csv with domains and respective topics from Google-Topics-Classification-v1.txt, as provided by Google with the Chrome browser.
      • Generates the dataset AOL-treated-Google-Topics-Classification-v1-domain-match.csv.
    • AOL-reduced-Citizen-Lab-Classification.ipynb:
      • Converts the dataset AOL-treated.csv.
      • Reduces the dataset AOL-treated.csv according to the dataset AOL-treated-Citizen-Lab-Classification-domain-match.csv.
      • Generates the dataset AOL-reduced-Citizen-Lab-Classification.csv.
        • The dataset AOL-reduced-Citizen-Lab-Classification.csv can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API.
        • This dataset contains singletons and the outlier that are dropped in some analyses.
        • This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics.
        • Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
    • AOL-reduced-Google-Topics-Classification-v1.ipynb:
      • Converts the dataset AOL-treated.csv.
      • Reduces the dataset AOL-treated.csv according to the dataset AOL-treated-Google-Topics-Classification-v1-domain-match.csv.
      • Generates the dataset AOL-reduced-Google-Topics-Classification-v1.csv.
        • The dataset AOL-reduced-Google-Topics-Classification-v1.csv can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API.
        • This dataset contains singletons and the outlier that are dropped in some analyses.
        • This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics.
        • Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
    • AOL-experimental.ipynb:
      • Converts the dataset AOL-treated.csv.
      • Drops singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months); and Defines browsing histories.
      • Generates the dataset AOL-experimental.csv.
        • The dataset AOL-experimental.csv can be used to empirically verify code correctness.
        • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
    • AOL-experimental-Citizen-Lab-Classification.ipynb:
      • Converts the dataset AOL-reduced-Citizen-Lab-Classification.csv.
      • Generates the dataset AOL-experimental-Citizen-Lab-Classification.csv.
        • The dataset AOL-experimental-Citizen-Lab-Classification.csv can be used to empirically verify code correctness.
        • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
    • AOL-experimental-Google-Topics-Classification-v1.ipynb:
      • Converts the dataset AOL-reduced-Google-Topics-Classification-v1.csv.
      • Generates the dataset AOL-experimental-Google-Topics-Classification-v1.csv.
        • The dataset AOL-experimental-Google-Topics-Classification-v1.csv can be used to empirically verify code correctness.
        • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
  • Analyses:
    • QIF-analyses-AOL-treated.ipynb:
      • QIF analyses based on the dataset AOL-treated.csv.
      • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
    • QIF-analyses-AOL-reduced-Citizen-Lab.ipynb:
      • QIF analyses based on the dataset AOL-reduced-Citizen-Lab-Classification.csv.
      • Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
    • QIF-analyses-AOL-reduced-Google-Topics-v1.ipynb:
      • QIF analyses based on the dataset AOL-reduced-Google-Topics-Classification-v1.csv.
      • Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
    • QIF-analyses-counting-experiment.ipynb:
      • QIF analysis for counting topics popularity using the binomial distribution.
    • QIF-analyses-AOL-experimental.ipynb:
      • QIF analyses based on the dataset AOL-experimental.csv.
      • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
    • QIF-analyses-AOL-experimental-Citizen-Lab.ipynb:
      • QIF analyses based on the dataset AOL-experimental-Citizen-Lab-Classification.csv.
      • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
    • QIF-analyses-AOL-experimental-Google-Topics-v1.ipynb:
      • QIF analyses based on the dataset AOL-experimental-Google-Topics-Classification-v1.csv.
      • All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.

License

GNU GPLv3.

To understand how the various GNU licenses are compatible with each other, please refer to the GNU licenses FAQ.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Related works

Has part
Dataset: 10.5281/zenodo.11029572 (DOI)
Is derived from
Computational notebook: https://github.com/nunesgh/topics-api-analysis (URL)

Software

Repository URL
https://github.com/nunesgh/topics-api-analysis
Programming language
Python