Software Open Access

Pie Model for Classical French -- Part-of-Speech and Morphology (CATTEX2009-max)

Camps, Jean-Baptiste; Gabay, Simon; Clérice, Thibault; Cafiero, Florian

Pie Model for Classical French, for Part-of-Speech and Morphology tags (CATTEX2009-max).

Trained on a corpus of Classical French Theatre.

More information:

- corpus: Camps, Jean-Baptiste, & Cafiero, Florian. (2019). Stylometric Analysis of Classical French Theatre [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3353421.

- F. Cafiero and J.B. Camps, Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489, https://advances.sciencemag.org/content/5/11/eaax5489/.

- J.B. Camps, S. Gabay, Th. Clérice and F. Cafiero, Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre, to be published.

Current results on test data:

::: Evaluation report for task: pos :::

all:
  accuracy: 0.9701
  precision: 0.92
  recall: 0.8964
  support: 4181
ambiguous-tokens:
  accuracy: 0.9229
  precision: 0.9203
  recall: 0.9175
  support: 934
unknown-tokens:
  accuracy: 0.8165
  precision: 0.4798
  recall: 0.4904
  support: 218

::: Evaluation report for task: MODE :::

all:
  accuracy: 0.9818
  precision: 0.8765
  recall: 0.8517
  support: 4181
ambiguous-tokens:
  accuracy: 0.84
  precision: 0.8483
  recall: 0.7612
  support: 125
unknown-tokens:
  accuracy: 0.8211
  precision: 0.7256
  recall: 0.658
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| MODE=con    | 0.81      | 0.94   | 0.87     | 18      |
| MODE=imp    | 0.83      | 0.78   | 0.80     | 68      |
| MODE=ind    | 0.91      | 0.92   | 0.92     | 341     |
| MODE=sub    | 0.84      | 0.62   | 0.71     | 60      |
| MODE=x      | 0.99      | 1.00   | 1.00     | 3694    |
| avg / total | 0.88      | 0.85   | 0.86     | 4181    |


::: Evaluation report for task: TEMPS :::

all:
  accuracy: 0.9871
  precision: 0.9305
  recall: 0.9259
  support: 4181
ambiguous-tokens:
  accuracy: 0.9135
  precision: 0.623
  recall: 0.6072
  support: 104
unknown-tokens:
  accuracy: 0.8394
  precision: 0.8693
  recall: 0.5399
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| TEMPS=fut   | 0.98      | 0.85   | 0.91     | 47      |
| TEMPS=ipf   | 0.93      | 0.88   | 0.90     | 16      |
| TEMPS=psp   | 0.80      | 1.00   | 0.89     | 4       |
| TEMPS=pst   | 0.95      | 0.91   | 0.93     | 334     |
| TEMPS=x     | 0.99      | 1.00   | 0.99     | 3780    |
| avg / total | 0.93      | 0.93   | 0.92     | 4181    |


::: Evaluation report for task: PERS :::

all:
  accuracy: 0.9859
  precision: 0.9821
  recall: 0.9668
  support: 4181
ambiguous-tokens:
  accuracy: 0.942
  precision: 0.9178
  recall: 0.9188
  support: 362
unknown-tokens:
  accuracy: 0.8394
  precision: 0.9426
  recall: 0.6344
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| PERS.=1     | 0.98      | 0.96   | 0.97     | 429     |
| PERS.=2     | 0.97      | 0.97   | 0.97     | 258     |
| PERS.=3     | 0.99      | 0.94   | 0.96     | 410     |
| PERS.=x     | 0.99      | 1.00   | 0.99     | 3084    |
| avg / total | 0.98      | 0.97   | 0.97     | 4181    |


::: Evaluation report for task: NOMB :::

all:
  accuracy: 0.9797
  precision: 0.9809
  recall: 0.9733
  support: 4181
ambiguous-tokens:
  accuracy: 0.7865
  precision: 0.7511
  recall: 0.6884
  support: 192
unknown-tokens:
  accuracy: 0.8349
  precision: 0.7918
  recall: 0.7729
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| NOMB.=p     | 0.98      | 0.95   | 0.97     | 545     |
| NOMB.=s     | 0.98      | 0.98   | 0.98     | 1831    |
| NOMB.=x     | 0.98      | 0.99   | 0.98     | 1805    |
| avg / total | 0.98      | 0.97   | 0.98     | 4181    |

::: Evaluation report for task: GENRE :::

all:
  accuracy: 0.9749
  precision: 0.969
  recall: 0.9685
  support: 4181
ambiguous-tokens:
  accuracy: 0.9118
  precision: 0.9063
  recall: 0.9208
  support: 465
unknown-tokens:
  accuracy: 0.7385
  precision: 0.7097
  recall: 0.6977
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| GENRE=f     | 0.92      | 0.94   | 0.93     | 387     |
| GENRE=m     | 0.97      | 0.94   | 0.96     | 940     |
| GENRE=n     | 1.00      | 1.00   | 1.00     | 45      |
| GENRE=x     | 0.98      | 0.99   | 0.99     | 2809    |
| avg / total | 0.97      | 0.97   | 0.97     | 4181    |


::: Evaluation report for task: CAS :::

all:
  accuracy: 0.9983
  precision: 0.9957
  recall: 0.9901
  support: 4181
ambiguous-tokens:
  accuracy: 0.9648
  precision: 0.9796
  recall: 0.9692
  support: 199
unknown-tokens:
  accuracy: 1.0
  precision: 1.0
  recall: 1.0
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| CAS=i       | 1.00      | 1.00   | 1.00     | 46      |
| CAS=n       | 1.00      | 1.00   | 1.00     | 190     |
| CAS=r       | 0.98      | 0.96   | 0.97     | 128     |
| CAS=x       | 1.00      | 1.00   | 1.00     | 3817    |
| avg / total | 1.00      | 0.99   | 0.99     | 4181    |

 

Files (166.2 MB)
Name Size
fr-class-morph-wembs_aux-CAS-2020_03_08-16_23_57.tar
md5:9a3e046d4228d208fb8b9f222366fcba
23.7 MB Download
fr-class-morph-wembs_aux-GENRE-2020_03_08-16_54_44.tar
md5:9f4f4db3a23b41ec623374bca6524ce6
23.7 MB Download
fr-class-morph-wembs_aux-MODE-2020_03_08-17_46_05.tar
md5:87e4c3086fd1d7705fa3b820612fcdcf
23.7 MB Download
fr-class-morph-wembs_aux-NOMB-2020_03_08-17_55_50.tar
md5:908028e214b41a4e693210971e1c3782
23.7 MB Download
fr-class-morph-wembs_aux-PERS-2020_03_08-18_24_29.tar
md5:c7fa27798620ac4123cee662c9f3e705
23.7 MB Download
fr-class-morph-wembs_aux-TEMPS-2020_03_08-19_03_26.tar
md5:4bf5cb151c13409e6c8ddf896adca78d
23.7 MB Download
fr-class-pos-wembs_aux-pos-2020_03_03-18_52_52.tar
md5:602207d2c3b2580276e56314c2a706bf
23.7 MB Download
  • Cafiero and Camps (2019). Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489,

  • Camps, Gabay, Clérice and Cafiero (to be published). Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre.

20
361
views
downloads
All versions This version
Views 209
Downloads 361349
Data volume 8.6 GB8.3 GB
Unique views 157
Unique downloads 5142

Share

Cite as