Published October 14, 2020
| Version v1
Video/Audio
Open
CORPUS17: a philological French corpus for 17thcentury
Creators
- 1. Universités de Neuchâtel et de Genève, Neuchâtel and Genève, Switzerland
- 2. École des Chartes, Paris, Franc
- 3. Université de Rennes, Rennes, France
Description
We investigate the creation of a 17th c. French literary corpus. We present the main options regarding available standards, the training data we created and the efficiency of the models produced for OCR, spelling normalization, and lemmatization – always with open-source solutions. We also present our encoding choices and the global logic of a corpus designed as a virtuous circle, enhancing automatically the tools that are used for its construction.
Files
corpus-17.webmsd.webm
Files
(30.8 MB)
Name | Size | Download all |
---|---|---|
md5:6e1886d5ea2f2f330aea37faeb07f1a9
|
30.8 MB | Preview Download |
Additional details
Related works
- Continues
- Conference paper: 10.1145/3423603.3424002 (DOI)