10.5281/zenodo.3701320
https://zenodo.org/records/3701320
oai:zenodo.org:3701320
Camps, Jean-Baptiste
Jean-Baptiste
Camps
0000-0003-0385-7037
École nationale des chartes
Gabay, Simon
Simon
Gabay
0000-0001-9094-4475
Université de Neuchâtel
Clérice, Thibault
Thibault
Clérice
0000-0003-1852-9204
École nationale des chartes
Cafiero, Florian
Florian
Cafiero
0000-0002-1951-6942
CNRS
Pie Model for Classical French -- Part-of-Speech and Morphology (CATTEX2009-max)
Zenodo
2020
Natural language processing
Part-of-speech tagging
Classical French
French Language
Deep Learning
2020-03-04
fra
10.5281/zenodo.3243486.
10.5281/zenodo.3696675
https://zenodo.org/communities/natural-language-processing
Creative Commons Attribution 4.0 International
Pie Model for Classical French, for Part-of-Speech and Morphology tags (CATTEX2009-max).
Trained on a corpus of Classical French Theatre.
More information:
- corpus: Camps, Jean-Baptiste, & Cafiero, Florian. (2019). Stylometric Analysis of Classical French Theatre [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3353421.
- F. Cafiero and J.B. Camps, Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489, https://advances.sciencemag.org/content/5/11/eaax5489/.
- J.B. Camps, S. Gabay, Th. Clérice and F. Cafiero, Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre, to be published.
Current results on test data:
::: Evaluation report for task: pos :::
all:
accuracy: 0.9701
precision: 0.92
recall: 0.8964
support: 4181
ambiguous-tokens:
accuracy: 0.9229
precision: 0.9203
recall: 0.9175
support: 934
unknown-tokens:
accuracy: 0.8165
precision: 0.4798
recall: 0.4904
support: 218
::: Evaluation report for task: MODE :::
all:
accuracy: 0.9818
precision: 0.8765
recall: 0.8517
support: 4181
ambiguous-tokens:
accuracy: 0.84
precision: 0.8483
recall: 0.7612
support: 125
unknown-tokens:
accuracy: 0.8211
precision: 0.7256
recall: 0.658
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| MODE=con | 0.81 | 0.94 | 0.87 | 18 |
| MODE=imp | 0.83 | 0.78 | 0.80 | 68 |
| MODE=ind | 0.91 | 0.92 | 0.92 | 341 |
| MODE=sub | 0.84 | 0.62 | 0.71 | 60 |
| MODE=x | 0.99 | 1.00 | 1.00 | 3694 |
| avg / total | 0.88 | 0.85 | 0.86 | 4181 |
::: Evaluation report for task: TEMPS :::
all:
accuracy: 0.9871
precision: 0.9305
recall: 0.9259
support: 4181
ambiguous-tokens:
accuracy: 0.9135
precision: 0.623
recall: 0.6072
support: 104
unknown-tokens:
accuracy: 0.8394
precision: 0.8693
recall: 0.5399
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| TEMPS=fut | 0.98 | 0.85 | 0.91 | 47 |
| TEMPS=ipf | 0.93 | 0.88 | 0.90 | 16 |
| TEMPS=psp | 0.80 | 1.00 | 0.89 | 4 |
| TEMPS=pst | 0.95 | 0.91 | 0.93 | 334 |
| TEMPS=x | 0.99 | 1.00 | 0.99 | 3780 |
| avg / total | 0.93 | 0.93 | 0.92 | 4181 |
::: Evaluation report for task: PERS :::
all:
accuracy: 0.9859
precision: 0.9821
recall: 0.9668
support: 4181
ambiguous-tokens:
accuracy: 0.942
precision: 0.9178
recall: 0.9188
support: 362
unknown-tokens:
accuracy: 0.8394
precision: 0.9426
recall: 0.6344
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| PERS.=1 | 0.98 | 0.96 | 0.97 | 429 |
| PERS.=2 | 0.97 | 0.97 | 0.97 | 258 |
| PERS.=3 | 0.99 | 0.94 | 0.96 | 410 |
| PERS.=x | 0.99 | 1.00 | 0.99 | 3084 |
| avg / total | 0.98 | 0.97 | 0.97 | 4181 |
::: Evaluation report for task: NOMB :::
all:
accuracy: 0.9797
precision: 0.9809
recall: 0.9733
support: 4181
ambiguous-tokens:
accuracy: 0.7865
precision: 0.7511
recall: 0.6884
support: 192
unknown-tokens:
accuracy: 0.8349
precision: 0.7918
recall: 0.7729
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| NOMB.=p | 0.98 | 0.95 | 0.97 | 545 |
| NOMB.=s | 0.98 | 0.98 | 0.98 | 1831 |
| NOMB.=x | 0.98 | 0.99 | 0.98 | 1805 |
| avg / total | 0.98 | 0.97 | 0.98 | 4181 |
::: Evaluation report for task: GENRE :::
all:
accuracy: 0.9749
precision: 0.969
recall: 0.9685
support: 4181
ambiguous-tokens:
accuracy: 0.9118
precision: 0.9063
recall: 0.9208
support: 465
unknown-tokens:
accuracy: 0.7385
precision: 0.7097
recall: 0.6977
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| GENRE=f | 0.92 | 0.94 | 0.93 | 387 |
| GENRE=m | 0.97 | 0.94 | 0.96 | 940 |
| GENRE=n | 1.00 | 1.00 | 1.00 | 45 |
| GENRE=x | 0.98 | 0.99 | 0.99 | 2809 |
| avg / total | 0.97 | 0.97 | 0.97 | 4181 |
::: Evaluation report for task: CAS :::
all:
accuracy: 0.9983
precision: 0.9957
recall: 0.9901
support: 4181
ambiguous-tokens:
accuracy: 0.9648
precision: 0.9796
recall: 0.9692
support: 199
unknown-tokens:
accuracy: 1.0
precision: 1.0
recall: 1.0
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| CAS=i | 1.00 | 1.00 | 1.00 | 46 |
| CAS=n | 1.00 | 1.00 | 1.00 | 190 |
| CAS=r | 0.98 | 0.96 | 0.97 | 128 |
| CAS=x | 1.00 | 1.00 | 1.00 | 3817 |
| avg / total | 1.00 | 0.99 | 0.99 | 4181 |