Published October 14, 2020 | Version v1
Video/Audio Open

CORPUS17: a philological French corpus for 17thcentury

  • 1. Universités de Neuchâtel et de Genève, Neuchâtel and Genève, Switzerland
  • 2. École des Chartes, Paris, Franc
  • 3. Université de Rennes, Rennes, France

Description

We investigate the creation of a 17th c. French literary corpus. We present the main options regarding available standards, the training data we created and the efficiency of the models produced for OCR, spelling normalization, and lemmatization – always with open-source solutions. We also present our encoding choices and the global logic of a corpus designed as a virtuous circle, enhancing automatically the tools that are used for its construction.

Files

corpus-17.webmsd.webm

Files (30.8 MB)

Name Size Download all
md5:6e1886d5ea2f2f330aea37faeb07f1a9
30.8 MB Preview Download

Additional details

Related works

Continues
Conference paper: 10.1145/3423603.3424002 (DOI)