NewsEye / READ OCR training dataset from French Newspapers (18th, 19th, early 20th C.)

Muehlberger, Guenter; Hackl, Guenter

doi:10.5281/zenodo.4293602

Published November 27, 2020 | Version v1

Dataset Open

NewsEye / READ OCR training dataset from French Newspapers (18th, 19th, early 20th C.)

The dataset comprises French newspaper pages from 18th, 19th and early 20th century with carefully corrected text. The page images were provided by the French National Library and comprise 127 pages (training set) and 8 pages (validation set). The data are formed according to the PAGE format (cf. Cf. https://github.com/PRImA-Research-Lab/PAGE-XML/) and were produced with the Transkribus platform with support of the NewsEye and the READ project.

Files

ATR_TrainingSet_BnF_Newseye_M2+.zip

Files (1.7 GB)

Name	Size
ATR_TrainingSet_BnF_Newseye_M2+.zip md5:448b3e1185f95cb5ca629adabb525a24	1.6 GB	Preview Download
ATR_ValidationSet_BnF_Newseye_M2+.zip md5:263c46cba35a437c7d1373a4561600a0	122.1 MB	Preview Download

Additional details

European Commission
NewsEye - NewsEye: A Digital Investigator for Historical Newspapers 770299
European Commission
READ - Recognition and Enrichment of Archival Documents 674943

Views

325

Downloads

Show more details

	All versions	This version
Views	1,017	1,012
Downloads	325	324
Data volume	368.7 GB	367.0 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

French

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 27, 2020
Modified: March 12, 2021

NewsEye / READ OCR training dataset from French Newspapers (18th, 19th, early 20th C.)

Authors/Creators

Description

Files

ATR_TrainingSet_BnF_Newseye_M2+.zip

Files (1.7 GB)

Additional details

Funding