There is a newer version of the record available.

Published June 11, 2021 | Version v1.0
Dataset Open

Relationship Between Poetic Meter and Meaning in Accentual-Syllabic Verse (data and replication code)

  • 1. Institute of Polish Language, Krakow
  • 2. Institute of Czech Literature, Prague
  • 3. Leiden University

Description

  • main.py: script to train both lda and word2vec models

  • main.ipynb: Jupyter Notebook containing all the analyses reported in the paper

  • pos.ipynb: clustering based on frequencies of parts-of-speech

  • corpora: contains original data for Czech, English, and Dutch poetry in JSON (proprietary German and Russian not included)
{   <= Each item in the following lists corresponds to particular poem and holds: 
    'words':    []      <= list of lemmata found in the poem
    'pos_tags': []      <= their POS-tags (Positional Morphological Tags for Czech, 
                           MyStem for Russian, TreeTagger tagsets for other corpora)
    'meters':   [[]]    <= list of meters found in poem
    'years':    []      <= year when poem published (year when author born in case of English)
    'n_words':  []      <= number of words
    'n_lines':  []      <= number of lines
    'authors':  []      <= author of the poem
    'titles':   []      <= title of the poem
    'schemes':  []      <= line-ending schemes
}
  • dicts: contains Gensim dictionary files for all 5 corpora
  • fig: contains all resulting figures
  • json > metadata: contains all metadata on poems in particular corpora
{   <= Each item in the following lists corresponds to particular poem and holds: 
    'meters':  [[]]     <= list of meters found in poem
    'years':   []       <= year when poem published (year when author born in case of English)
    'n_words': []       <= number of words
    'n_lines': []       <= number of lines
    'authors': []       <= author of the poem
    'titles':  []       <= title of the poem
}
  • json > topics: contains topic probabilities in particular poems
[   <= each item corresponds to particular poem and comprise 100-dimensional dict
    {
        'topic title': its probability in poem
    }
]
  • json > pos: contains POS relative frequencies in particular poems
[   <= each item corresponds to particular poem
    {
        'POS': its frequency
    }
]
  • json > w2v: contains mapping of lemmata and their neighbours in word2vec models
  • models: contains pretrained lda and word2vec models (Gensim)

Files

semanticHalo.zip

Files (10.0 GB)

Name Size Download all
md5:61d271198501726944a298b9c12f510f
10.0 GB Preview Download