Claudinha Data Protection Law (LGPD) Corpus
Creators
- 1. Universidade de São Paulo
- 2. Instituto Lawgorithm
Description
This dataset contains privacy policies paragraphs in Portuguese. Each paragraph was annotated by an expert annotator, using a guideline (DOI: 10.5281/zenodo.13371432). Two types of notes were made: the category of the Brazilian Data Protection (LGPD) Law that fits the text and the level of compliance with LGPD.
The categories are divided into three blocks: Omission of data required by law (block 1), Data processing (block 2), Unclear language and others (block 3). There are 3 levels of compliance, with level 1 being in compliance with the law, level 2 being partial potential non-compliance, and level 3 being potential total non-compliance.
There are 6341 distinct paragraphs. The corpus has more records (8341 clauses), as there are duplications, since a paragraph can belong to more than one guideline category. Pontetially non-compliant clauses corresponds to 1413 records (21.9%). Below, statistics regarding the number of paragraphs belonging to each category and frequencies of categories in privacy policies.
Category |
Number of clauses |
Document frequency |
Block 1: Omission of data required by law | ||
Access to data | 283 | 61 |
Anonymization, blocking and deletion | 204 | 46 |
Automated decision | 45 | 19 |
Category of processed data | 1427 | 63 |
Controller identification | 107 | 47 |
Data correction | 154 | 52 |
Duration of treatment | 234 | 52 |
Existence of treatment | 142 | 37 |
Express consent | 176 | 41 |
ID and contact DPO | 150 | 46 |
Non-consent | 91 | 33 |
Personal data source | 471 | 55 |
Portability | 97 | 36 |
Purpose of sharing | 119 | 16 |
Purpose of treatment | 1620 | 71 |
Revoke consent | 154 | 50 |
Right of deletion | 173 | 40 |
Third party sharing | 919 | 69 |
Block 2: Data processing | ||
Advertising | 215 | 38 |
Children data | 118 | 37 |
Cookies | 432 | 60 |
Consent by use | 339 | 47 |
Other consents | 73 | 30 |
Policy changes | 210 | 61 |
"Take it or leave it" | 50 | 21 |
Block 3: Unclear language and others | ||
Generic expressions | 244 | 37 |
Other unclear clauses | 94 | 26 |
Files
corpus_data_privacy.csv
Files
(2.7 MB)
Name | Size | Download all |
---|---|---|
md5:cbc6bc61e1fbe9396b98157355935e41
|
2.7 MB | Preview Download |