Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries
- 1. ENC
- 2. UniNE
- 3. Paris VII/INRIA
- 4. INRIA
Description
Manuscript Sales Catalogues (MSC) are highly important for authenticating documents and studying the reception of authors. Their regular publication throughout Europe since the beginning of the 19th c. has consequently raised the interest around scaling up the means for automatically structuring their contents.
Following successful first encoding tests with GROBID-Dictionaries on a single MSC collection, we aim in this paper to present the results of more advanced tests of the system’s capacity to handle a larger corpus with MSC of different dealers, and therefore multiple layouts. Four different types of catalogues published between the middle of the 19th c. and the beginning of the 20th c. have been tested.
Files
Scaling_up_trainingData_TEI2019.zip
Files
(198.1 MB)
Name | Size | Download all |
---|---|---|
md5:52d28fec5905266e42fdc027c7fb74e7
|
198.1 MB | Preview Download |
Additional details
Related works
- Is documented by
- https://hal.inria.fr/hal-02272962 (URL)
References
- Mohamed Khemakhem, Laurent Romary, Simon Gabay, Hervé Bohbot, Francesca Frontini, et al.. Automatically Encoding Encyclopedic-like Resources in TEI. The annual TEI Conference and Members Meeting, Sep 2018, Tokyo, Japan.
- Mohamed Khemakhem, Luca Foppiano, Laurent Romary. Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields. electronic lexicography, eLex 2017, Sep 2017, Leiden, Netherlands.
- Mohamed Khemakhem, Axel Herold, Laurent Romary. Enhancing Usability for Automatically Structuring Digitised Dictionaries. GLOBALEX workshop at LREC 2018, May 2018, Miyazaki, Japan. 2018.