Published July 1, 2019 | Version v1
Dataset Open

Data and models for automatic scansion experiment Dutch Song Database

  • 1. University of Antwerp
  • 2. KNAW Meertens Institute

Description

This release contains the data used in an experiment on automatic scansion for historical Dutch song texts. Aside form the data, two models are included in this release as well. One model is essential for running the code that is part of this experiment (model_s); while the other model is an example of an acquired automatic scansion model (best_model).

Item descriptions:

  • meertens-meter-songs.zip → collection of 23,197 historic Dutch songs (xml-format). These files (and the gathered meta-data) stems from a collaboration project between the Dutch Song Database and the Digital Library for Dutch Literature. All files contain meta-data on the number of beats that is present in individual verse lines. Snippet:
<lg>
    <l id="s1:l1" met="4" type="-+">
        Een Meysken op een Rivierken <rhyme label="a" type="m">sadt</rhyme>,</l>
    <l id="s1:l2" met="2" type="-+">
        So schoon zy <rhyme label="b" type="m">was</rhyme>,</l>
    <l id="s1:l3" met="4" type="-+">
        Sy sadt en verbeyde haer soete <rhyme label="c" type="m">Lief</rhyme>,</l>
    <l id="s1:l4" met="2" type="-+">
        Int groene <rhyme label="b" type="m" corresp="#s1:l2">gras</rhyme>.</l>
</lg>

 

 

  • model_s → model used for syllabification and assignment of lexical stress of (historic) Dutch words. The development of this model was part of a previous project.

 

  • stress_xml.zip → collection of 23,197 historic Dutch songs (xml-format). These are the same songs a the meertens-meter-songs, yet now their individual words are syllabified and annotated for lexical stress. The songs in this folder are used as input during the training process. Snippet:
<l id="s1:l1" met="4" type="-+">
    <w token="een">
        <s word-stress="1" line-stress="0">een</s>
    </w>
    <w token="meysken">
        <s word-stress="1" line-stress="0">meys</s>
        <s word-stress="0" line-stress="0">ken</s>
    </w>
    <w token="op">
        <s word-stress="1" line-stress="0">op</s>
    </w>
    <w token="een">
        <s word-stress="1" line-stress="0">een</s>
    </w>
    <w token="rivierken">
        <s word-stress="0" line-stress="0">ri</s>
        <s word-stress="1" line-stress="0">vier</s>
        <s word-stress="0" line-stress="0">ken</s>
    </w>
    <rhyme label="a" type="m">
        <w token="sadt">
            <s word-stress="1" line-stress="0">sadt</s>
        </w>
    </rhyme>
</l>

 

  • gold_scan.zip → 198 Dutch song files (xml-format). These files have been annotated by an expert for line stress.

 

  • eval_splits.zip → contains the splits made from gold_scan. These are the splits used in the automatic scansion experiment: a development set of 98 songs (used during training), and a test set of 99 songs (used for evaluating the best model after training).

 

  • best_model.zip → contains the files of an acquired model for automatic Dutch song scansion.

Files

best_model.zip

Files (311.1 MB)

Name Size Download all
md5:0d31af73d55b58fa13d44064a6dacade
66.6 MB Preview Download
md5:ed47c37b706456acc2fa3c294a21f610
229.8 kB Preview Download
md5:271de23e53b5e3a8bae465416703f470
311.4 kB Preview Download
md5:418dd577e15db5da05c288957721b266
52.0 MB Preview Download
md5:8d5334200281ac8727f76baefb24f96f
120.3 MB Preview Download
md5:d338f8f1d0a6d8673562e447b365c365
71.7 MB Preview Download

Additional details

Related works