Published August 25, 2024 | Version v1
Dataset Open

Claudinha Data Protection Law (LGPD) Corpus

Description

This dataset contains privacy policies paragraphs in Portuguese. Each paragraph was annotated by an expert annotator, using a guideline (DOI: 10.5281/zenodo.13371432). Two types of notes were made: the category of the Brazilian Data Protection (LGPD) Law that fits the text and the level of compliance with LGPD.

The categories are divided into three blocks: Omission of data required by law (block 1), Data processing (block 2), Unclear language and others (block 3). There are 3 levels of compliance, with level 1 being in compliance with the law, level 2 being partial potential non-compliance, and level 3 being potential total non-compliance.

There are 6341 distinct paragraphs. The corpus has more records (8341 clauses), as there are duplications, since a paragraph can belong to more than one guideline category. Pontetially non-compliant clauses corresponds to 1413 records (21.9%). Below, statistics regarding the number of paragraphs belonging to each category and frequencies of categories in privacy policies.

Category

Number of clauses

Document frequency

Block 1: Omission of data required by law
Access to data 283 61
Anonymization, blocking and deletion 204 46
Automated decision 45 19
Category of processed data 1427 63
Controller identification 107 47
Data correction 154 52
Duration of treatment 234 52
Existence of treatment 142 37
Express consent 176 41
ID and contact DPO 150 46
Non-consent 91 33
Personal data source 471 55
Portability 97 36
Purpose of sharing 119 16
Purpose of treatment 1620 71
Revoke consent 154 50
Right of deletion 173 40
Third party sharing 919 69
Block 2: Data processing
Advertising 215 38
Children data 118 37
Cookies 432 60
Consent by use 339 47
Other consents 73 30
Policy changes 210 61
"Take it or leave it" 50 21
Block 3: Unclear language and others
Generic expressions 244 37
Other unclear clauses 94 26

 

Files

corpus_data_privacy.csv

Files (2.7 MB)

Name Size Download all
md5:cbc6bc61e1fbe9396b98157355935e41
2.7 MB Preview Download