Published August 15, 2025 | Version 1.1.0
Dataset Open

Short Stories Dataset

  • 1. ROR icon Universidade Estadual de Campinas (UNICAMP)
  • 2. University of Campinas

Description

The corpus was created with the purpose of investigating the construction of narratives about Black and white women in short stories generated in Portuguese. Each of the 2100 instances of the dataset comprises a short story generated with the usage of the model meta-llama/Llama-3.2-3B-Instruct from Hugging Face. The data is inside a csv file, with each row containing: the prompt employed, the short story outputted by the model, the name used to create the story, or the tag “no name” if no name was used, and the race of the main character (as set in the prompt, this tag was mainly used for visualization purposes).

 

The datasheet with more information on the corpus, along with generation and analysis codes, can be found in this repository: https://github.com/hiaac-nlp/clusteringdiscourses.

Files

shortstories_name_noname_pt.csv

Files (5.5 MB)

Name Size Download all
md5:3eac2acd3479c3dfc2939649d0dcf172
5.5 MB Preview Download

Additional details

Dates

Available
2025-08-15
Dataset made available

Software