UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Dataset Open Access

NewsEye / READ AS training dataset from French Newspapers (19th, early 20th C.)

Muehlberger, Guenter; Hackl, Guenter

The dataset comprises French newspaper pages from 19th and early 20th century with annotated text. The page images were provided by the French National Library and comprise 183 pages (training set). The data are formed according to the PAGE format (cf. Cf. https://github.com/PRImA-Research-Lab/PAGE-XML/) and were produced with the Transkribus platform with support of the NewsEye and the READ project. The guidelines with which the AS GT was created are uploaded here as well.

Files (2.3 GB)
Name Size
Article GT guidelines for Newseye.pdf
md5:fe01d7a05da416972ac784307079caaa
75.1 kB Download
AS_TrainingSet_BnF_NewsEye_v2.zip
md5:58e0a167389e0a15897d728055e48e57
2.3 GB Download
339
119
views
downloads
All versions This version
Views 339187
Downloads 11981
Data volume 181.5 GB94.0 GB
Unique views 270164
Unique downloads 8458

Share

Cite as