Video/Audio Open Access

Standardizing linguistic data: method and tools for annotating(pre-orthographic) French

Gabay, Simon; Clérice, Thibault; Camps, Jean-Baptiste; Tanguy, Jean-Baptiste; Gille-Levenson, Matthias

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">linguistic annotation, pre-orthographic language, lemmatisation,POS-tagging</subfield>
  <controlfield tag="005">20201013082930.0</controlfield>
  <controlfield tag="001">4084499</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">École des Chartes</subfield>
    <subfield code="a">Clérice, Thibault</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">École des Chartes</subfield>
    <subfield code="a">Camps, Jean-Baptiste</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Sorbonne Université</subfield>
    <subfield code="a">Tanguy, Jean-Baptiste</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">École normale supérieure de Lyon</subfield>
    <subfield code="a">Gille-Levenson, Matthias</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">36479539</subfield>
    <subfield code="z">md5:bdb7905bc80f09612a21f6d967723254</subfield>
    <subfield code="u"></subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-10-13</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Universités de Neuchâtel et de Genève</subfield>
    <subfield code="a">Gabay, Simon</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Standardizing linguistic data: method and tools for annotating(pre-orthographic) French</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u"></subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;With the development of big corpora of various periods, it becomescrucial to standardise linguistic annotation (e.g.lemmas, POS tags,morphological annotation) to increase the interoperability of the dataproduced, despite diachronic variations. In the present paper, wedescribe both methodologically (by proposing annotation principles)and technically (by creating the required training data and therelevant models) the production of a linguistic tagger for (early)modern French (16-18th c.), taking as much as possible into accountalready existing standards for contemporary and, especially, medievalFrench&lt;/p&gt;</subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4084498</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4084499</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">video</subfield>
All versions This version
Views 2525
Downloads 888888
Data volume 32.4 GB32.4 GB
Unique views 2525
Unique downloads 702702


Cite as