Pie Model for Classical French -- Part-of-Speech and Morphology (CATTEX2009-max)
- 1. École nationale des chartes
- 2. Université de Neuchâtel
- 3. CNRS
Description
Pie Model for Classical French, for Part-of-Speech and Morphology tags (CATTEX2009-max).
Trained on a corpus of Classical French Theatre.
More information:
- corpus: Camps, Jean-Baptiste, & Cafiero, Florian. (2019). Stylometric Analysis of Classical French Theatre [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3353421.
- F. Cafiero and J.B. Camps, Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489, https://advances.sciencemag.org/content/5/11/eaax5489/.
- J.B. Camps, S. Gabay, Th. Clérice and F. Cafiero, Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre, to be published.
Current results on test data:
::: Evaluation report for task: pos :::
all:
accuracy: 0.9701
precision: 0.92
recall: 0.8964
support: 4181
ambiguous-tokens:
accuracy: 0.9229
precision: 0.9203
recall: 0.9175
support: 934
unknown-tokens:
accuracy: 0.8165
precision: 0.4798
recall: 0.4904
support: 218
::: Evaluation report for task: MODE :::
all:
accuracy: 0.9818
precision: 0.8765
recall: 0.8517
support: 4181
ambiguous-tokens:
accuracy: 0.84
precision: 0.8483
recall: 0.7612
support: 125
unknown-tokens:
accuracy: 0.8211
precision: 0.7256
recall: 0.658
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| MODE=con | 0.81 | 0.94 | 0.87 | 18 |
| MODE=imp | 0.83 | 0.78 | 0.80 | 68 |
| MODE=ind | 0.91 | 0.92 | 0.92 | 341 |
| MODE=sub | 0.84 | 0.62 | 0.71 | 60 |
| MODE=x | 0.99 | 1.00 | 1.00 | 3694 |
| avg / total | 0.88 | 0.85 | 0.86 | 4181 |
::: Evaluation report for task: TEMPS :::
all:
accuracy: 0.9871
precision: 0.9305
recall: 0.9259
support: 4181
ambiguous-tokens:
accuracy: 0.9135
precision: 0.623
recall: 0.6072
support: 104
unknown-tokens:
accuracy: 0.8394
precision: 0.8693
recall: 0.5399
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| TEMPS=fut | 0.98 | 0.85 | 0.91 | 47 |
| TEMPS=ipf | 0.93 | 0.88 | 0.90 | 16 |
| TEMPS=psp | 0.80 | 1.00 | 0.89 | 4 |
| TEMPS=pst | 0.95 | 0.91 | 0.93 | 334 |
| TEMPS=x | 0.99 | 1.00 | 0.99 | 3780 |
| avg / total | 0.93 | 0.93 | 0.92 | 4181 |
::: Evaluation report for task: PERS :::
all:
accuracy: 0.9859
precision: 0.9821
recall: 0.9668
support: 4181
ambiguous-tokens:
accuracy: 0.942
precision: 0.9178
recall: 0.9188
support: 362
unknown-tokens:
accuracy: 0.8394
precision: 0.9426
recall: 0.6344
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| PERS.=1 | 0.98 | 0.96 | 0.97 | 429 |
| PERS.=2 | 0.97 | 0.97 | 0.97 | 258 |
| PERS.=3 | 0.99 | 0.94 | 0.96 | 410 |
| PERS.=x | 0.99 | 1.00 | 0.99 | 3084 |
| avg / total | 0.98 | 0.97 | 0.97 | 4181 |
::: Evaluation report for task: NOMB :::
all:
accuracy: 0.9797
precision: 0.9809
recall: 0.9733
support: 4181
ambiguous-tokens:
accuracy: 0.7865
precision: 0.7511
recall: 0.6884
support: 192
unknown-tokens:
accuracy: 0.8349
precision: 0.7918
recall: 0.7729
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| NOMB.=p | 0.98 | 0.95 | 0.97 | 545 |
| NOMB.=s | 0.98 | 0.98 | 0.98 | 1831 |
| NOMB.=x | 0.98 | 0.99 | 0.98 | 1805 |
| avg / total | 0.98 | 0.97 | 0.98 | 4181 |
::: Evaluation report for task: GENRE :::
all:
accuracy: 0.9749
precision: 0.969
recall: 0.9685
support: 4181
ambiguous-tokens:
accuracy: 0.9118
precision: 0.9063
recall: 0.9208
support: 465
unknown-tokens:
accuracy: 0.7385
precision: 0.7097
recall: 0.6977
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| GENRE=f | 0.92 | 0.94 | 0.93 | 387 |
| GENRE=m | 0.97 | 0.94 | 0.96 | 940 |
| GENRE=n | 1.00 | 1.00 | 1.00 | 45 |
| GENRE=x | 0.98 | 0.99 | 0.99 | 2809 |
| avg / total | 0.97 | 0.97 | 0.97 | 4181 |
::: Evaluation report for task: CAS :::
all:
accuracy: 0.9983
precision: 0.9957
recall: 0.9901
support: 4181
ambiguous-tokens:
accuracy: 0.9648
precision: 0.9796
recall: 0.9692
support: 199
unknown-tokens:
accuracy: 1.0
precision: 1.0
recall: 1.0
support: 218
::: Classification report :::
| target | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| CAS=i | 1.00 | 1.00 | 1.00 | 46 |
| CAS=n | 1.00 | 1.00 | 1.00 | 190 |
| CAS=r | 0.98 | 0.96 | 0.97 | 128 |
| CAS=x | 1.00 | 1.00 | 1.00 | 3817 |
| avg / total | 1.00 | 0.99 | 0.99 | 4181 |
Files
Files
(166.2 MB)
Name | Size | Download all |
---|---|---|
md5:9a3e046d4228d208fb8b9f222366fcba
|
23.7 MB | Download |
md5:9f4f4db3a23b41ec623374bca6524ce6
|
23.7 MB | Download |
md5:87e4c3086fd1d7705fa3b820612fcdcf
|
23.7 MB | Download |
md5:908028e214b41a4e693210971e1c3782
|
23.7 MB | Download |
md5:c7fa27798620ac4123cee662c9f3e705
|
23.7 MB | Download |
md5:4bf5cb151c13409e6c8ddf896adca78d
|
23.7 MB | Download |
md5:602207d2c3b2580276e56314c2a706bf
|
23.7 MB | Download |
Additional details
Related works
- Is supplement to
- Dataset: 10.5281/zenodo.3243486. (DOI)
References
- Cafiero and Camps (2019). Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489,
- Camps, Gabay, Clérice and Cafiero (to be published). Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre.