Published November 21, 2018
| Version version 1.0
Software
Open
Tokeniser for Picard
Description
This software is developed for the tokenisation of Picard texts, e.g. splitting sentences into words and ponctuation signs. The tokeniser handles ambiguous separators such as dash, apostrophe, dot.
The software is developed in Perl 5.22.1. The installation and the running issues are explained in the script file.
Files
resources.zip
Files
(31.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2e09b5df04747efc3ec21838c4278852
|
16.8 kB | Download |
|
md5:ecae94d4558c72d3a00cd4a1a28850c3
|
14.5 kB | Preview Download |
Additional details
References
- Delphine Bernhard, Amalia Todirascu, Fanny Martin, Pascale Erhart, Lucie Steiblé, Dominique Huck, Christophe Rey (2017). Problèmes de tokénisation pour deux langues régionales de France, l'alsacien et le picard