Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published January 8, 2019 | Version 1
Software Open

Tokenization for Occitan (Gascon and Lengadocian)

  • 1. Université de Poitiers

Description

A perl programme to tokenise texts in Occitan.

The programme is an adaptation from the perl programme to tokenize texts in French made by Tanguy et Hathout (2007) in its extended version (that is to say with a list of exceptions).

To launch the programme, execute the following instruction:

perl segmenteur_occitan.pl exceptions_occitan.txt <input >output

This tool was developed in the context of the RESTAURE project, funded by the French ANR.

Files

exceptions_occitan.txt

Files (19.9 kB)

Name Size Download all
md5:b517ac75dd4a67a3f074fedac2f6985f
194 Bytes Preview Download
md5:eba63bb27ba7131ec639d8d2d9695d98
3.0 kB Download
md5:16565cd9cba138794692e34289da8946
16.8 kB Download