Published March 11, 2021 | Version 1
Dataset Open

NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.)

  • 1. University of Innsbruck
  • 2. READ-COOP

Description

The dataset comprises swedish newspaper pages from late 18th till early 20th century with carefully corrected text. The page images were provided by the National Library Finland (NLF) and comprise 255 pages (training set) and 6 pages (validation set). The data are formed according to the PAGE format (cf. Cf. https://github.com/PRImA-Research-Lab/PAGE-XML/) and were produced with the Transkribus platform with support of the NewsEye and the READ project.

Files

ATR_TrainingSet_NLF_Newseye_GT_SV_M2+.zip

Files (1.5 GB)

Name Size Download all
md5:add442c83817e2ce955b711699fe4c7f
1.5 GB Preview Download
md5:604472b9f8bc77e57f47f2371d5c464e
42.9 MB Preview Download

Additional details

Funding

European Commission
NewsEye - NewsEye: A Digital Investigator for Historical Newspapers 770299
European Commission
READ - Recognition and Enrichment of Archival Documents 674943