Published March 12, 2025 | Version v1
Dataset Open

ToS;DR policies dataset (training)

Description

Dataset Overview

This dataset is derived from Terms of Service; Didn't Read (ToS;DR), a project that analyzes and categorizes terms of service from various online services. The dataset has been cleaned and organized into two CSV files, with a focus on reproducibility and usability. The privacy dataset is a subset of the full dataset, specifically filtering for privacy-related terms.

File Descriptions

1. training_tosdr_all_data.csv

This file contains the complete collection of terms of service data after cleaning and preprocessing. Each row represents a statement (or "point") extracted from a service's terms of service.

Key Columns:

  • case_id: Unique identifier for the case.
  • case_title: Brief description of the case.
  • topic_id: Unique identifier for the topic.
  • topic_title: Broad category the case falls under (e.g., Transparency, Copyright License).
  • sentence: The extracted text from the terms of service.
  • seq_case_id: Sequential identifier for the case, used for mapping.
  • seq_topic_id: Sequential identifier for the topic, used for mapping.

2. training_tosdr_privacy_data.csv

This file is a subset of the full dataset, focusing exclusively on privacy-related terms. It includes cases related to tracking, data collection, account deletion policies, and other privacy-related topics.

Key Columns:

  • case_id: Unique identifier for the case.
  • case_title: Brief description of the case.
  • topic_id: Unique identifier for the topic.
  • topic_title: Broad category the case falls under (e.g., Privacy, Data Collection).
  • sentence: The extracted text from the terms of service.
  • seq_case_id: Sequential identifier for the case, used for mapping.
  • seq_topic_id: Sequential identifier for the topic, used for mapping.

Files

all_case_mapper.csv

Files (8.0 MB)

Name Size Download all
md5:ed84b376db4af28929a4113191901bf6
16.7 kB Preview Download
md5:f6809e358b3fcf1ad574e9a290d93d2a
588 Bytes Preview Download
md5:8177880f5956d57319afd9325bf5c1a2
5.4 kB Preview Download
md5:8a482a16214aed01aaf093f13e191b3c
6.6 kB Preview Download
md5:1f98067f7ac4afd8f3780748fd3adc3e
402 Bytes Preview Download
md5:8d4d618bce46037fbed213f4f109516c
6.1 MB Preview Download
md5:34ab5625903adc5873231c5d5d91e6cc
1.9 MB Preview Download