Published March 4, 2020 | Version v2
Software Open

Pie Model for Classical French -- Part-of-Speech and Morphology (CATTEX2009-max)

  • 1. École nationale des chartes
  • 2. Université de Neuchâtel
  • 3. CNRS

Description

Pie Model for Classical French, for Part-of-Speech and Morphology tags (CATTEX2009-max).

Trained on a corpus of Classical French Theatre.

More information:

- corpus: Camps, Jean-Baptiste, & Cafiero, Florian. (2019). Stylometric Analysis of Classical French Theatre [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3353421.

- F. Cafiero and J.B. Camps, Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489, https://advances.sciencemag.org/content/5/11/eaax5489/.

- J.B. Camps, S. Gabay, Th. Clérice and F. Cafiero, Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre, to be published.

Current results on test data:

::: Evaluation report for task: pos :::

all:
  accuracy: 0.9701
  precision: 0.92
  recall: 0.8964
  support: 4181
ambiguous-tokens:
  accuracy: 0.9229
  precision: 0.9203
  recall: 0.9175
  support: 934
unknown-tokens:
  accuracy: 0.8165
  precision: 0.4798
  recall: 0.4904
  support: 218

::: Evaluation report for task: MODE :::

all:
  accuracy: 0.9818
  precision: 0.8765
  recall: 0.8517
  support: 4181
ambiguous-tokens:
  accuracy: 0.84
  precision: 0.8483
  recall: 0.7612
  support: 125
unknown-tokens:
  accuracy: 0.8211
  precision: 0.7256
  recall: 0.658
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| MODE=con    | 0.81      | 0.94   | 0.87     | 18      |
| MODE=imp    | 0.83      | 0.78   | 0.80     | 68      |
| MODE=ind    | 0.91      | 0.92   | 0.92     | 341     |
| MODE=sub    | 0.84      | 0.62   | 0.71     | 60      |
| MODE=x      | 0.99      | 1.00   | 1.00     | 3694    |
| avg / total | 0.88      | 0.85   | 0.86     | 4181    |


::: Evaluation report for task: TEMPS :::

all:
  accuracy: 0.9871
  precision: 0.9305
  recall: 0.9259
  support: 4181
ambiguous-tokens:
  accuracy: 0.9135
  precision: 0.623
  recall: 0.6072
  support: 104
unknown-tokens:
  accuracy: 0.8394
  precision: 0.8693
  recall: 0.5399
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| TEMPS=fut   | 0.98      | 0.85   | 0.91     | 47      |
| TEMPS=ipf   | 0.93      | 0.88   | 0.90     | 16      |
| TEMPS=psp   | 0.80      | 1.00   | 0.89     | 4       |
| TEMPS=pst   | 0.95      | 0.91   | 0.93     | 334     |
| TEMPS=x     | 0.99      | 1.00   | 0.99     | 3780    |
| avg / total | 0.93      | 0.93   | 0.92     | 4181    |


::: Evaluation report for task: PERS :::

all:
  accuracy: 0.9859
  precision: 0.9821
  recall: 0.9668
  support: 4181
ambiguous-tokens:
  accuracy: 0.942
  precision: 0.9178
  recall: 0.9188
  support: 362
unknown-tokens:
  accuracy: 0.8394
  precision: 0.9426
  recall: 0.6344
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| PERS.=1     | 0.98      | 0.96   | 0.97     | 429     |
| PERS.=2     | 0.97      | 0.97   | 0.97     | 258     |
| PERS.=3     | 0.99      | 0.94   | 0.96     | 410     |
| PERS.=x     | 0.99      | 1.00   | 0.99     | 3084    |
| avg / total | 0.98      | 0.97   | 0.97     | 4181    |


::: Evaluation report for task: NOMB :::

all:
  accuracy: 0.9797
  precision: 0.9809
  recall: 0.9733
  support: 4181
ambiguous-tokens:
  accuracy: 0.7865
  precision: 0.7511
  recall: 0.6884
  support: 192
unknown-tokens:
  accuracy: 0.8349
  precision: 0.7918
  recall: 0.7729
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| NOMB.=p     | 0.98      | 0.95   | 0.97     | 545     |
| NOMB.=s     | 0.98      | 0.98   | 0.98     | 1831    |
| NOMB.=x     | 0.98      | 0.99   | 0.98     | 1805    |
| avg / total | 0.98      | 0.97   | 0.98     | 4181    |

::: Evaluation report for task: GENRE :::

all:
  accuracy: 0.9749
  precision: 0.969
  recall: 0.9685
  support: 4181
ambiguous-tokens:
  accuracy: 0.9118
  precision: 0.9063
  recall: 0.9208
  support: 465
unknown-tokens:
  accuracy: 0.7385
  precision: 0.7097
  recall: 0.6977
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| GENRE=f     | 0.92      | 0.94   | 0.93     | 387     |
| GENRE=m     | 0.97      | 0.94   | 0.96     | 940     |
| GENRE=n     | 1.00      | 1.00   | 1.00     | 45      |
| GENRE=x     | 0.98      | 0.99   | 0.99     | 2809    |
| avg / total | 0.97      | 0.97   | 0.97     | 4181    |


::: Evaluation report for task: CAS :::

all:
  accuracy: 0.9983
  precision: 0.9957
  recall: 0.9901
  support: 4181
ambiguous-tokens:
  accuracy: 0.9648
  precision: 0.9796
  recall: 0.9692
  support: 199
unknown-tokens:
  accuracy: 1.0
  precision: 1.0
  recall: 1.0
  support: 218


::: Classification report :::

| target      | precision | recall | f1-score | support |
|-------------|-----------|--------|----------|---------|
| CAS=i       | 1.00      | 1.00   | 1.00     | 46      |
| CAS=n       | 1.00      | 1.00   | 1.00     | 190     |
| CAS=r       | 0.98      | 0.96   | 0.97     | 128     |
| CAS=x       | 1.00      | 1.00   | 1.00     | 3817    |
| avg / total | 1.00      | 0.99   | 0.99     | 4181    |

 

Files

Files (166.2 MB)

Name Size Download all
md5:9a3e046d4228d208fb8b9f222366fcba
23.7 MB Download
md5:9f4f4db3a23b41ec623374bca6524ce6
23.7 MB Download
md5:87e4c3086fd1d7705fa3b820612fcdcf
23.7 MB Download
md5:908028e214b41a4e693210971e1c3782
23.7 MB Download
md5:c7fa27798620ac4123cee662c9f3e705
23.7 MB Download
md5:4bf5cb151c13409e6c8ddf896adca78d
23.7 MB Download
md5:602207d2c3b2580276e56314c2a706bf
23.7 MB Download

Additional details

Related works

Is supplement to
Dataset: 10.5281/zenodo.3243486. (DOI)

References

  • Cafiero and Camps (2019). Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489,
  • Camps, Gabay, Clérice and Cafiero (to be published). Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre.