ToS;DR policies dataset (training)
Creators
Description
Dataset Overview
This dataset is derived from Terms of Service; Didn't Read (ToS;DR), a project that analyzes and categorizes terms of service from various online services. The dataset has been cleaned and organized into two CSV files, with a focus on reproducibility and usability. The privacy dataset is a subset of the full dataset, specifically filtering for privacy-related terms.
File Descriptions
1. training_tosdr_all_data.csv
This file contains the complete collection of terms of service data after cleaning and preprocessing. Each row represents a statement (or "point") extracted from a service's terms of service.
Key Columns:
- case_id: Unique identifier for the case.
- case_title: Brief description of the case.
- topic_id: Unique identifier for the topic.
- topic_title: Broad category the case falls under (e.g., Transparency, Copyright License).
- sentence: The extracted text from the terms of service.
- seq_case_id: Sequential identifier for the case, used for mapping.
- seq_topic_id: Sequential identifier for the topic, used for mapping.
2. training_tosdr_privacy_data.csv
This file is a subset of the full dataset, focusing exclusively on privacy-related terms. It includes cases related to tracking, data collection, account deletion policies, and other privacy-related topics.
Key Columns:
- case_id: Unique identifier for the case.
- case_title: Brief description of the case.
- topic_id: Unique identifier for the topic.
- topic_title: Broad category the case falls under (e.g., Privacy, Data Collection).
- sentence: The extracted text from the terms of service.
- seq_case_id: Sequential identifier for the case, used for mapping.
- seq_topic_id: Sequential identifier for the topic, used for mapping.
Files
all_case_mapper.csv
Files
(8.0 MB)
Name | Size | Download all |
---|---|---|
md5:ed84b376db4af28929a4113191901bf6
|
16.7 kB | Preview Download |
md5:f6809e358b3fcf1ad574e9a290d93d2a
|
588 Bytes | Preview Download |
md5:8177880f5956d57319afd9325bf5c1a2
|
5.4 kB | Preview Download |
md5:8a482a16214aed01aaf093f13e191b3c
|
6.6 kB | Preview Download |
md5:1f98067f7ac4afd8f3780748fd3adc3e
|
402 Bytes | Preview Download |
md5:8d4d618bce46037fbed213f4f109516c
|
6.1 MB | Preview Download |
md5:34ab5625903adc5873231c5d5d91e6cc
|
1.9 MB | Preview Download |