Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published May 5, 2023 | Version 1.0.0
Dataset Restricted

Revealing Gender Biases in (TJSP) Court Decisions with Natural Language Processing

  • 1. University of Campinas
  • 1. University of Campinas


Data derived from the realm of the social sciences is often produced in digital text form, which motivates its use as a source for natural language processing methods. Researchers and practitioners have developed and relied on artificial intelligence techniques to collect, process, and analyze documents in the legal field, especially for tasks such as text summarization and classification. In this scenario, we identify an underexplored potential of natural language processing used to delve into human rights issues in the context of artificial intelligence for social good. Qualitative and quantitative social science methods have been used to study matters such as institutional gender biasing in legal settings; however, natural language processing-based approaches can help analyze the issue on a larger scale. The work Revealing Gender Biases in Court Decisions with Natural Language Processing presents a protocol to address the automatic detection of institutional gender biasing in Brazilian courts, which comprises: (a) a pipeline of collection, annotation, and preparation of text extracted from court decisions issued by the São Paulo state Court of Justice in cases of domestic violence and parental alienation, which resulted in two datasets; (b) an experimental protocol of supervised binary classification over the decisions, performed with BERTimbau-based models; (c) methods for evaluating and validating such protocol.

Here, we present the two datasets associated with this work: Dataset 1, made of 1,604 decisions issued by the Court between 2012 and 2019 in domestic violence-related criminal cases (DVC), and Dataset 2, made of 49 decisions issued by the Court in the same timeframe in civil and criminal parental alienation-related cases (PAC). Details on the content of each dataset, as well as their pipelines of extraction, annotation, preparation, and use, can be found in the original work, published as a Master's dissertation.

The structure of the datasets is presented as follows:

├── Dataset 1 (domestic violence cases, DVC):
│   ├── files
│   └── content
├── Dataset 2 (parental alienation cases, PAC):
│   ├── files
│   └── content

  • files folder: contains input and output files associated with the pipelines of data extraction, annotation, and preparation as documented in the original work;
  • content folder: contains TXT and PDF files for each decision.

Please note that, to access and use the datasets, one must abide to a deed of undertaking, whose violation entails legal liability of the breacher. Details on guidelines of legal and ethical compliance regarding this data can be found in the associated publications.



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

This dataset consists of decisions issued in second instance by the São Paulo state Court of Justice and collected from the court’s official search engine; still, despite their public availability, some cases might be, or have been at some point, under secrecy. The texts can also contain sensitive data on the subjects involved in the cases under analysis. For researchers and practitioners who wish to access and/or use the datasets for non-commercial research and/or educational purposes (here identified as "researcher"), we can provide access to them under the following conditions and terms:

    1. Researcher shall use the datasets only for non-commercial research and educational purposes;  
    2. Researcher accepts full responsibility for his/her use of the datasets and shall defend and indemnify the original authors and the University of Campinas, including their employees, trustees, officers and agents, against any and all claims arising from the researcher's use of the datasets, including but not limited to researcher's use of any copies of secret or sensitive information that he or she may derive from the datasets;
    3. Researcher will not under any circumstances disclose secret information or sensitive personal data found in the datasets, nor derived from them;
    4. Researcher may provide research associates and colleagues with access to the datasets, as long as they first agree to be bound by these terms and conditions;
    5. The original authors and the University of Campinas reserve the right to terminate the researcher's access to the datasets at any time.


Access Request:

Please, fill this form with the following data:

    1. Name of the institution you are affiliated with;
    2. Brief description of what the datasets will be used for.

Bear in mind that the request must be done from an institutional email account.

You are currently not logged in. Do you have an account? Log in here

Additional details