Manually validated PageXML files for images in monography "Mémoire sur St Domingue par H ? M. Michel"
Authors/Creators
Description
Transcription of monography "Mémoire sur St Domingue par H ? M. Michel", dating from 1797 and dealing on slavery in Haiti (103 pages in total). Transcription contains 61 pages in PageXML format, useful for training a handwritten text recognition model. The PageXML files were created by applying a Transkribus model (French Model 1, see https://readcoop.eu/model/french-general-model/, or the non-public The Text Titan I) on the images at https://europeana.transcribathon.eu/documents/story/?story=12733. The PageXML output was automatically corrected using the flat-text manual transcription available with these images, and the resulting PageXML files were manually validated. The software for automatically correcting OCR output using flat-text manual transcriptions (and hence adding a link between image and text not present in the flat-text files) has been developed in the AI4Culture project (https://pro.europeana.eu/project/ai4culture-an-ai-platform-for-the-cultural-heritage-data-space). Note: transcriptions for pages 21, 22, 34 and 58 are not present yet.
Files
page_xml_corrected.zip
Files
(274.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:32cd1c98f3edabeab45924ffddceb4c3
|
274.7 kB | Preview Download |