Published November 21, 2018 | Version version 1.0
Software Open

Tokeniser for Picard

Authors/Creators

  • 1. University of Strasbourg

Description

This software is developed for the tokenisation of Picard texts, e.g. splitting sentences into words and ponctuation signs. The tokeniser handles ambiguous separators such as dash, apostrophe, dot.

The software is developed in Perl 5.22.1. The installation and the running issues are explained in the script file.

Files

resources.zip

Files (31.3 kB)

Name Size Download all
md5:2e09b5df04747efc3ec21838c4278852
16.8 kB Download
md5:ecae94d4558c72d3a00cd4a1a28850c3
14.5 kB Preview Download

Additional details

References

  • Delphine Bernhard, Amalia Todirascu, Fanny Martin, Pascale Erhart, Lucie Steiblé, Dominique Huck, Christophe Rey (2017). Problèmes de tokénisation pour deux langues régionales de France, l'alsacien et le picard