Published June 21, 2024 | Version v2.0
Dataset Open

AOL Dataset for Browsing History and Topics of Interest

  • 1. ROR icon Macquarie University
  • 2. ROR icon Universidade Federal de Minas Gerais

Description

AOL Dataset for Browsing History and Topics of Interest

This record provides the datasets of the paper The Privacy-Utility Trade-off in the Topics API (DOI: 10.1145/3658644.3670368; arXiv: 2406.15309).

The datasets generating code and the experimental results can be found in 10.5281/zenodo.11229402 (github.com/nunesgh/topics-api-analysis).

Files

  1. AOL-treated.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies. It contains singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months) that are dropped in some analyses.
  2. AOL-treated-unique-domains.csv: Auxiliary dataset containing all the unique domains from AOL-treated.csv.
  3. Citizen-Lab-Classification.csv: Auxiliary dataset containing the Citizen Lab Classification data, as of commit ebd0ee8, treated for inconsistencies and filtered according to Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an.
  4. AOL-treated-Citizen-Lab-Classification-domain-match.csv: Auxiliary dataset containing domains matched from AOL-treated-unique-domains.csv with domains and respective topics from Citizen-Lab-Classification.csv.
  5. Google-Topics-Classification-v1.txt: Auxiliary dataset containing the Google Topics API taxonomy v1 data as provided by Google with the Chrome browser.
  6. AOL-treated-Google-Topics-Classification-v1-domain-match.csv: Auxiliary dataset containing domains matched from AOL-treated-unique-domains.csv with domains and respective topics from Google-Topics-Classification-v1.txt.
  7. AOL-reduced-Citizen-Lab-Classification.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. It contains singletons and the outlier that are dropped in some analyses.
    This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
  8. AOL-reduced-Google-Topics-Classification-v1.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. It contains singletons and the outlier that are dropped in some analyses.
    This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
  9. AOL-experimental.csv: This dataset can be used to empirically verify code correctness for 10.5281/zenodo.11229402. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
  10. AOL-experimental-Citizen-Lab-Classification.csv: This dataset can be used to empirically verify code correctness for 10.5281/zenodo.11229402. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
  11. AOL-experimental-Google-Topics-Classification-v1.csv: This dataset can be used to empirically verify code correctness for 10.5281/zenodo.11229402. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Files

AOL-reduced-Google-Topics-Classification-v1.csv

Files (2.9 GB)

Name Size Download all
md5:ad08ef04bca48bdc4d51de650c7c0f17
312.3 MB Preview Download
md5:1e1fd3aafd215011dc0a7ba91184b388
165.6 MB Preview Download
md5:3b17f35695310613ca6ec07641f2ac61
1.0 GB Preview Download
md5:bc6c015c1a9a1f15f205bfe48260117e
259.4 MB Preview Download
md5:6027acaf952756d4c9cb8cae3ddc4f56
138.2 MB Preview Download
md5:a3c5d33bc734142ae1adc6f85d1968bf
191.7 kB Preview Download
md5:82aae4ea4f61de3423e2e599140d8e41
112.0 kB Preview Download
md5:9e67cfa8081c13db5f1e2102e2460a87
31.5 MB Preview Download
md5:d2e3239cab4b4d2a20d142449d856415
965.3 MB Preview Download
md5:bd931cd4339e628021e128535e9c4ce9
779.3 kB Preview Download
md5:ffab2e0f46433d00da5363cb7a556128
258.9 kB Preview Download

Additional details

Related works

Has part
Computational notebook: 10.5281/zenodo.11229402 (DOI)
Is supplement to
Preprint: arXiv:2406.15309 (arXiv)
Conference paper: 10.1145/3658644.3670368 (DOI)