Published November 30, 2022 | Version v1
Dataset Open

Supplementary code and data for the paper `From stage to page: language independent bootstrap measures of distinctiveness in fictional speech`

  • 1. Inistitute of Polish Language, PAN

Description

The repository provides full data and processing / analysis pipeline for the paper 'From stage to page: language independent bootstrap measures of distinctiveness in fictional speech'

Rendered notebooks are also available through Github:

1) Preparation, energy distance and exploration (main)

2) Keyword curves & formal modeling

 

- `00_dracor_get_data.R`. Script uses DraCor dedicated API to get texts spoken by characters

- `01_distinctiveness_energy.ipynb` does the heavy lifting of data wrangling, cleaning and preprocessing, plus implements energy distance bootstrapping and does exploratory analysis

- `02_logodds_curves.R` calculates keyword curves for characters

- `03_analysis_and_models.R` explores keyword curves and does Bayesian models

Files

character_distinctiveness.zip

Files (1.8 GB)

Name Size Download all
md5:0056c22c98f2722e6417f40d7db12371
1.8 GB Preview Download