Dataset Open Access

NewsEye / READ OCR training dataset from Finnish Newspapers (18th, 19th, early 20th C.)

Muehlberger, Guenter; Hackl, Guenter

The dataset comprises finnish newspaper pages from late 18th till early 20th century with carefully corrected text. The page images were provided by the National Library Finland (NLF) and comprise 526 pages (training set) and 8 pages (validation set). The data are formed according to the PAGE format (cf. Cf. https://github.com/PRImA-Research-Lab/PAGE-XML/) and were produced with the Transkribus platform with support of the NewsEye and the READ project.

Files (5.5 GB)
Name Size
ATR_TrainingSet_NLF_Newseye_GT_FI_M2+.zip
md5:34094df0d9695e239e7d72b2268fcdb3
5.4 GB Download
ATR_ValidationSet_NLF_Newseye_GT_FI_M2+.zip
md5:ef9d3823c1a175b2cb501e07f2e943c2
77.2 MB Download
113
39
views
downloads
All versions This version
Views 113113
Downloads 3939
Data volume 109.2 GB109.2 GB
Unique views 9292
Unique downloads 2626

Share

Cite as