Standardizing linguistic data: method and tools for annotating(pre-orthographic) French

doi:10.5281/zenodo.4084499

Published October 13, 2020 | Version v1

Video/Audio Open

Standardizing linguistic data: method and tools for annotating(pre-orthographic) French

1. Universités de Neuchâtel et de Genève
2. École des Chartes
3. Sorbonne Université
4. École normale supérieure de Lyon

With the development of big corpora of various periods, it becomescrucial to standardise linguistic annotation (e.g.lemmas, POS tags,morphological annotation) to increase the interoperability of the dataproduced, despite diachronic variations. In the present paper, wedescribe both methodologically (by proposing annotation principles)and technically (by creating the required training data and therelevant models) the production of a linguistic tagger for (early)modern French (16-18th c.), taking as much as possible into accountalready existing standards for contemporary and, especially, medievalFrench

Files

Gabay-al-1.webm

Files (36.5 MB)

Name	Size	Download all
Gabay-al-1.webm md5:bdb7905bc80f09612a21f6d967723254	36.5 MB	Preview Download

Views

834

Downloads

Show more details

	All versions	This version
Views	90	75
Downloads	834	833
Data volume	38.0 GB	38.0 GB

More info on how stats are collected....

DOI

Resource type

Video/Audio

Publisher

Zenodo

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 13, 2020
Modified: October 13, 2020

Standardizing linguistic data: method and tools for annotating(pre-orthographic) French

Creators

Description

Files

Gabay-al-1.webm

Files (36.5 MB)