Published June 21, 2024
| Version v2.0
Dataset
Open
AOL Dataset for Browsing History and Topics of Interest
Creators
Description
AOL Dataset for Browsing History and Topics of Interest
This record provides the datasets of the paper The Privacy-Utility Trade-off in the Topics API (DOI: 10.1145/3658644.3670368; arXiv: 2406.15309).
The datasets generating code and the experimental results can be found in 10.5281/zenodo.11229402 (github.com/nunesgh/topics-api-analysis).
Files
AOL-treated.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies. It contains singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months) that are dropped in some analyses.AOL-treated-unique-domains.csv: Auxiliary dataset containing all the unique domains fromAOL-treated.csv.Citizen-Lab-Classification.csv: Auxiliary dataset containing the Citizen Lab Classification data, as of commit ebd0ee8, treated for inconsistencies and filtered according to Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an.AOL-treated-Citizen-Lab-Classification-domain-match.csv: Auxiliary dataset containing domains matched fromAOL-treated-unique-domains.csvwith domains and respective topics fromCitizen-Lab-Classification.csv.Google-Topics-Classification-v1.txt: Auxiliary dataset containing the Google Topics API taxonomy v1 data as provided by Google with the Chrome browser.AOL-treated-Google-Topics-Classification-v1-domain-match.csv: Auxiliary dataset containing domains matched fromAOL-treated-unique-domains.csvwith domains and respective topics fromGoogle-Topics-Classification-v1.txt.AOL-reduced-Citizen-Lab-Classification.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. It contains singletons and the outlier that are dropped in some analyses.
This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.AOL-reduced-Google-Topics-Classification-v1.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. It contains singletons and the outlier that are dropped in some analyses.
This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.AOL-experimental.csv: This dataset can be used to empirically verify code correctness for 10.5281/zenodo.11229402. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.AOL-experimental-Citizen-Lab-Classification.csv: This dataset can be used to empirically verify code correctness for 10.5281/zenodo.11229402. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.AOL-experimental-Google-Topics-Classification-v1.csv: This dataset can be used to empirically verify code correctness for 10.5281/zenodo.11229402. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset.
License
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Files
AOL-reduced-Google-Topics-Classification-v1.csv
Files
(2.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:ad08ef04bca48bdc4d51de650c7c0f17
|
312.3 MB | Preview Download |
|
md5:1e1fd3aafd215011dc0a7ba91184b388
|
165.6 MB | Preview Download |
|
md5:3b17f35695310613ca6ec07641f2ac61
|
1.0 GB | Preview Download |
|
md5:bc6c015c1a9a1f15f205bfe48260117e
|
259.4 MB | Preview Download |
|
md5:6027acaf952756d4c9cb8cae3ddc4f56
|
138.2 MB | Preview Download |
|
md5:a3c5d33bc734142ae1adc6f85d1968bf
|
191.7 kB | Preview Download |
|
md5:82aae4ea4f61de3423e2e599140d8e41
|
112.0 kB | Preview Download |
|
md5:9e67cfa8081c13db5f1e2102e2460a87
|
31.5 MB | Preview Download |
|
md5:d2e3239cab4b4d2a20d142449d856415
|
965.3 MB | Preview Download |
|
md5:bd931cd4339e628021e128535e9c4ce9
|
779.3 kB | Preview Download |
|
md5:ffab2e0f46433d00da5363cb7a556128
|
258.9 kB | Preview Download |
Additional details
Related works
- Has part
- Computational notebook: 10.5281/zenodo.11229402 (DOI)
- Is supplement to
- Preprint: arXiv:2406.15309 (arXiv)
- Conference paper: 10.1145/3658644.3670368 (DOI)