Short Stories Dataset
Authors/Creators
Description
The corpus was created with the purpose of investigating the construction of narratives about Black and white women in short stories generated in Portuguese. Each of the 2100 instances of the dataset comprises a short story generated with the usage of the model meta-llama/Llama-3.2-3B-Instruct from Hugging Face. The data is inside a csv file, with each row containing: the prompt employed, the short story outputted by the model, the name used to create the story, or the tag “no name” if no name was used, and the race of the main character (as set in the prompt, this tag was mainly used for visualization purposes).
The datasheet with more information on the corpus, along with generation and analysis codes, can be found in this repository: https://github.com/hiaac-nlp/clusteringdiscourses.
Files
shortstories_name_noname_pt.csv
Files
(5.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:3eac2acd3479c3dfc2939649d0dcf172
|
5.5 MB | Preview Download |
Additional details
Dates
- Available
-
2025-08-15Dataset made available
Software
- Repository URL
- https://github.com/hiaac-nlp/clusteringdiscourses
- Programming language
- Python