Data and models for automatic scansion experiment Dutch Song Database

doi:10.5281/zenodo.3243662

Published July 1, 2019 | Version v1

Dataset Open

Data and models for automatic scansion experiment Dutch Song Database

1. University of Antwerp
2. KNAW Meertens Institute

This release contains the data used in an experiment on automatic scansion for historical Dutch song texts. Aside form the data, two models are included in this release as well. One model is essential for running the code that is part of this experiment (model_s); while the other model is an example of an acquired automatic scansion model (best_model).

Item descriptions:

meertens-meter-songs.zip → collection of 23,197 historic Dutch songs (xml-format). These files (and the gathered meta-data) stems from a collaboration project between the Dutch Song Database and the Digital Library for Dutch Literature. All files contain meta-data on the number of beats that is present in individual verse lines. Snippet:

<lg>
    <l id="s1:l1" met="4" type="-+">
        Een Meysken op een Rivierken <rhyme label="a" type="m">sadt</rhyme>,</l>
    <l id="s1:l2" met="2" type="-+">
        So schoon zy <rhyme label="b" type="m">was</rhyme>,</l>
    <l id="s1:l3" met="4" type="-+">
        Sy sadt en verbeyde haer soete <rhyme label="c" type="m">Lief</rhyme>,</l>
    <l id="s1:l4" met="2" type="-+">
        Int groene <rhyme label="b" type="m" corresp="#s1:l2">gras</rhyme>.</l>
</lg>

model_s → model used for syllabification and assignment of lexical stress of (historic) Dutch words. The development of this model was part of a previous project.

stress_xml.zip → collection of 23,197 historic Dutch songs (xml-format). These are the same songs a the meertens-meter-songs, yet now their individual words are syllabified and annotated for lexical stress. The songs in this folder are used as input during the training process. Snippet:

<l id="s1:l1" met="4" type="-+">
    <w token="een">
        <s word-stress="1" line-stress="0">een</s>
    </w>
    <w token="meysken">
        <s word-stress="1" line-stress="0">meys</s>
        <s word-stress="0" line-stress="0">ken</s>
    </w>
    <w token="op">
        <s word-stress="1" line-stress="0">op</s>
    </w>
    <w token="een">
        <s word-stress="1" line-stress="0">een</s>
    </w>
    <w token="rivierken">
        <s word-stress="0" line-stress="0">ri</s>
        <s word-stress="1" line-stress="0">vier</s>
        <s word-stress="0" line-stress="0">ken</s>
    </w>
    <rhyme label="a" type="m">
        <w token="sadt">
            <s word-stress="1" line-stress="0">sadt</s>
        </w>
    </rhyme>
</l>

gold_scan.zip → 198 Dutch song files (xml-format). These files have been annotated by an expert for line stress.

eval_splits.zip → contains the splits made from gold_scan. These are the splits used in the automatic scansion experiment: a development set of 98 songs (used during training), and a test set of 99 songs (used for evaluating the best model after training).

best_model.zip → contains the files of an acquired model for automatic Dutch song scansion.

Files

best_model.zip

Files (311.1 MB)

Name	Size	Download all
best_model.zip md5:0d31af73d55b58fa13d44064a6dacade	66.6 MB	Preview Download
eval_splits.zip md5:ed47c37b706456acc2fa3c294a21f610	229.8 kB	Preview Download
gold_scan.zip md5:271de23e53b5e3a8bae465416703f470	311.4 kB	Preview Download
meertens-meter-songs.zip md5:418dd577e15db5da05c288957721b266	52.0 MB	Preview Download
model_s.zip md5:8d5334200281ac8727f76baefb24f96f	120.3 MB	Preview Download
stress_xml.zip md5:d338f8f1d0a6d8673562e447b365c365	71.7 MB	Preview Download

Additional details

Is referenced by: https://github.com/WHaverals/scanner (URL)

	All versions	This version
Views	365	361
Downloads	53	53
Data volume	4.2 GB	4.2 GB

Data and models for automatic scansion experiment Dutch Song Database

Creators

Description

Files

best_model.zip

Files (311.1 MB)

Additional details

Related works