Published July 1, 2019
| Version v1
Dataset
Open
Data and models for automatic scansion experiment Dutch Song Database
- 1. University of Antwerp
- 2. KNAW Meertens Institute
Description
This release contains the data used in an experiment on automatic scansion for historical Dutch song texts. Aside form the data, two models are included in this release as well. One model is essential for running the code that is part of this experiment (model_s); while the other model is an example of an acquired automatic scansion model (best_model).
Item descriptions:
- meertens-meter-songs.zip → collection of 23,197 historic Dutch songs (xml-format). These files (and the gathered meta-data) stems from a collaboration project between the Dutch Song Database and the Digital Library for Dutch Literature. All files contain meta-data on the number of beats that is present in individual verse lines. Snippet:
<lg> <l id="s1:l1" met="4" type="-+"> Een Meysken op een Rivierken <rhyme label="a" type="m">sadt</rhyme>,</l> <l id="s1:l2" met="2" type="-+"> So schoon zy <rhyme label="b" type="m">was</rhyme>,</l> <l id="s1:l3" met="4" type="-+"> Sy sadt en verbeyde haer soete <rhyme label="c" type="m">Lief</rhyme>,</l> <l id="s1:l4" met="2" type="-+"> Int groene <rhyme label="b" type="m" corresp="#s1:l2">gras</rhyme>.</l> </lg>
- model_s → model used for syllabification and assignment of lexical stress of (historic) Dutch words. The development of this model was part of a previous project.
- stress_xml.zip → collection of 23,197 historic Dutch songs (xml-format). These are the same songs a the meertens-meter-songs, yet now their individual words are syllabified and annotated for lexical stress. The songs in this folder are used as input during the training process. Snippet:
<l id="s1:l1" met="4" type="-+"> <w token="een"> <s word-stress="1" line-stress="0">een</s> </w> <w token="meysken"> <s word-stress="1" line-stress="0">meys</s> <s word-stress="0" line-stress="0">ken</s> </w> <w token="op"> <s word-stress="1" line-stress="0">op</s> </w> <w token="een"> <s word-stress="1" line-stress="0">een</s> </w> <w token="rivierken"> <s word-stress="0" line-stress="0">ri</s> <s word-stress="1" line-stress="0">vier</s> <s word-stress="0" line-stress="0">ken</s> </w> <rhyme label="a" type="m"> <w token="sadt"> <s word-stress="1" line-stress="0">sadt</s> </w> </rhyme> </l>
- gold_scan.zip → 198 Dutch song files (xml-format). These files have been annotated by an expert for line stress.
- eval_splits.zip → contains the splits made from gold_scan. These are the splits used in the automatic scansion experiment: a development set of 98 songs (used during training), and a test set of 99 songs (used for evaluating the best model after training).
- best_model.zip → contains the files of an acquired model for automatic Dutch song scansion.
Files
best_model.zip
Files
(311.1 MB)
Name | Size | Download all |
---|---|---|
md5:0d31af73d55b58fa13d44064a6dacade
|
66.6 MB | Preview Download |
md5:ed47c37b706456acc2fa3c294a21f610
|
229.8 kB | Preview Download |
md5:271de23e53b5e3a8bae465416703f470
|
311.4 kB | Preview Download |
md5:418dd577e15db5da05c288957721b266
|
52.0 MB | Preview Download |
md5:8d5334200281ac8727f76baefb24f96f
|
120.3 MB | Preview Download |
md5:d338f8f1d0a6d8673562e447b365c365
|
71.7 MB | Preview Download |
Additional details
Related works
- Is referenced by
- https://github.com/WHaverals/scanner (URL)