Published January 12, 2022 | Version v2
Dataset Open

Longitudinal corpus of privacy policies

Authors/Creators

  • 1. University of Basel

Description

This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

  • policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.
  • policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.
  • labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

Details on the methodology can be found in the accompanying paper:

Isabel Wagner. 2023. Privacy Policies across the Ages: Content of Privacy Policies 1996–2021. ACM Trans. Priv. Secur. 26, 3, Article 32 (August 2023), 32 pages. https://doi.org/10.1145/3590152

Files

labeled-policies.zip

Files (4.4 GB)

Name Size Download all
md5:a3ad0819a4a68f343a72e0fe78e9452d
4.0 GB Preview Download
md5:a881eb7ed602a5da032ea52e610cffea
25.4 MB Preview Download
md5:6e85572bd9224e8d77ea3bb0583b97ed
375.1 MB Preview Download

Additional details

Related works

Is described by
Journal: 10.1145/3590152 (DOI)