Dataset Open Access

NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.)

Muehlberger, Guenter; Hackl, Guenter

The dataset comprises swedish newspaper pages from late 18th till early 20th century with carefully corrected text. The page images were provided by the National Library Finland (NLF) and comprise 255 pages (training set) and 6 pages (validation set). The data are formed according to the PAGE format (cf. Cf. https://github.com/PRImA-Research-Lab/PAGE-XML/) and were produced with the Transkribus platform with support of the NewsEye and the READ project.

Files (1.5 GB)
Name Size
ATR_TrainingSet_NLF_Newseye_GT_SV_M2+.zip
md5:add442c83817e2ce955b711699fe4c7f
1.5 GB Download
ATR_ValidationSet_NLF_Newseye_GT_SV_M2+.zip
md5:604472b9f8bc77e57f47f2371d5c464e
42.9 MB Download
149
60
views
downloads
All versions This version
Views 149149
Downloads 6060
Data volume 42.5 GB42.5 GB
Unique views 127127
Unique downloads 4242

Share

Cite as