Published June 11, 2021
| Version v1.0
Dataset
Open
Relationship Between Poetic Meter and Meaning in Accentual-Syllabic Verse (data and replication code)
Creators
- 1. Institute of Polish Language, Krakow
- 2. Institute of Czech Literature, Prague
- 3. Leiden University
Description
-
main.py: script to train both lda and word2vec models
-
main.ipynb: Jupyter Notebook containing all the analyses reported in the paper
-
pos.ipynb: clustering based on frequencies of parts-of-speech
- corpora: contains original data for Czech, English, and Dutch poetry in JSON (proprietary German and Russian not included)
{ <= Each item in the following lists corresponds to particular poem and holds:
'words': [] <= list of lemmata found in the poem
'pos_tags': [] <= their POS-tags (Positional Morphological Tags for Czech,
MyStem for Russian, TreeTagger tagsets for other corpora)
'meters': [[]] <= list of meters found in poem
'years': [] <= year when poem published (year when author born in case of English)
'n_words': [] <= number of words
'n_lines': [] <= number of lines
'authors': [] <= author of the poem
'titles': [] <= title of the poem
'schemes': [] <= line-ending schemes
}
- dicts: contains Gensim dictionary files for all 5 corpora
- fig: contains all resulting figures
- json > metadata: contains all metadata on poems in particular corpora
{ <= Each item in the following lists corresponds to particular poem and holds:
'meters': [[]] <= list of meters found in poem
'years': [] <= year when poem published (year when author born in case of English)
'n_words': [] <= number of words
'n_lines': [] <= number of lines
'authors': [] <= author of the poem
'titles': [] <= title of the poem
}
- json > topics: contains topic probabilities in particular poems
[ <= each item corresponds to particular poem and comprise 100-dimensional dict
{
'topic title': its probability in poem
}
]
- json > pos: contains POS relative frequencies in particular poems
[ <= each item corresponds to particular poem
{
'POS': its frequency
}
]
- json > w2v: contains mapping of lemmata and their neighbours in word2vec models
- models: contains pretrained lda and word2vec models (Gensim)
Files
semanticHalo.zip
Files
(10.0 GB)
Name | Size | Download all |
---|---|---|
md5:61d271198501726944a298b9c12f510f
|
10.0 GB | Preview Download |