Revealing Gender Biases in (TJSP) Court Decisions with Natural Language Processing
Contributors
Data curators:
Supervisors:
- 1. University of Campinas
Description
Data derived from the realm of the social sciences is often produced in digital text form, which motivates its use as a source for natural language processing methods. Researchers and practitioners have developed and relied on artificial intelligence techniques to collect, process, and analyze documents in the legal field, especially for tasks such as text summarization and classification. In this scenario, we identify an underexplored potential of natural language processing used to delve into human rights issues in the context of artificial intelligence for social good. Qualitative and quantitative social science methods have been used to study matters such as institutional gender biasing in legal settings; however, natural language processing-based approaches can help analyze the issue on a larger scale. The work Revealing Gender Biases in Court Decisions with Natural Language Processing presents a protocol to address the automatic detection of institutional gender biasing in Brazilian courts, which comprises: (a) a pipeline of collection, annotation, and preparation of text extracted from court decisions issued by the São Paulo state Court of Justice in cases of domestic violence and parental alienation, which resulted in two datasets; (b) an experimental protocol of supervised binary classification over the decisions, performed with BERTimbau-based models; (c) methods for evaluating and validating such protocol.
Here, we present the two datasets associated with this work: Dataset 1, made of 1,604 decisions issued by the Court between 2012 and 2019 in domestic violence-related criminal cases (DVC), and Dataset 2, made of 49 decisions issued by the Court in the same timeframe in civil and criminal parental alienation-related cases (PAC). Details on the content of each dataset, as well as their pipelines of extraction, annotation, preparation, and use, can be found in the original work, published as a Master's dissertation.
The structure of the datasets is presented as follows:
├── Dataset 1 (domestic violence cases, DVC): lesao.zip
│ ├── files
│ └── content
├── Dataset 2 (parental alienation cases, PAC): ap.zip
│ ├── files
│ └── content
- files folder: contains input and output files associated with the pipelines of data extraction, annotation, and preparation as documented in the original work;
- content folder: contains TXT and PDF files for each decision.
Please note that, to access and use the datasets, one must abide to a deed of undertaking, whose violation entails legal liability of the breacher. Details on guidelines of legal and ethical compliance regarding this data can be found in the associated publications.
Files
Additional details
Related works
- Is documented by
- Thesis: https://www.repositorio.unicamp.br/acervo/detalhe/1313341 (URL)
- Is supplemented by
- Software: https://github.com/ra-ysa/gender_law_nlp (URL)
- Conference paper: https://aclanthology.org/2022.nllp-1.20/ (URL)
- Conference paper: https://dl.acm.org/doi/10.1145/3630106.3658937 (URL)