Tokenization for Occitan (Gascon and Lengadocian)
Description
A perl programme to tokenise texts in Occitan.
The programme is an adaptation from the perl programme to tokenize texts in French made by Tanguy et Hathout (2007) in its extended version (that is to say with a list of exceptions).
To launch the programme, execute the following instruction:
perl segmenteur_occitan.pl exceptions_occitan.txt <input >output
This tool was developed in the context of the RESTAURE project, funded by the French ANR.
Files
exceptions_occitan.txt
Files
(19.9 kB)
Name | Size | Download all |
---|---|---|
md5:b517ac75dd4a67a3f074fedac2f6985f
|
194 Bytes | Preview Download |
md5:eba63bb27ba7131ec639d8d2d9695d98
|
3.0 kB | Download |
md5:16565cd9cba138794692e34289da8946
|
16.8 kB | Download |