NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.)

Muehlberger, Guenter; Hackl, Guenter

doi:10.5281/zenodo.4599624

Published March 11, 2021 | Version 1

Dataset Open

NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.)

1. University of Innsbruck
2. READ-COOP

The dataset comprises swedish newspaper pages from late 18th till early 20th century with carefully corrected text. The page images were provided by the National Library Finland (NLF) and comprise 255 pages (training set) and 6 pages (validation set). The data are formed according to the PAGE format (cf. Cf. https://github.com/PRImA-Research-Lab/PAGE-XML/) and were produced with the Transkribus platform with support of the NewsEye and the READ project.

Files

ATR_TrainingSet_NLF_Newseye_GT_SV_M2+.zip

Files (1.5 GB)

Name	Size
ATR_TrainingSet_NLF_Newseye_GT_SV_M2+.zip md5:add442c83817e2ce955b711699fe4c7f	1.5 GB	Preview Download
ATR_ValidationSet_NLF_Newseye_GT_SV_M2+.zip md5:604472b9f8bc77e57f47f2371d5c464e	42.9 MB	Preview Download

Additional details

European Commission
NewsEye - NewsEye: A Digital Investigator for Historical Newspapers 770299
European Commission
READ - Recognition and Enrichment of Archival Documents 674943

815

Views

239

Downloads

Show more details

	All versions	This version
Views	815	813
Downloads	239	238
Data volume	205.5 GB	204.0 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Swedish

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 11, 2021
Modified: March 15, 2021

NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.)

Authors/Creators

Description

Files

ATR_TrainingSet_NLF_Newseye_GT_SV_M2+.zip

Files (1.5 GB)

Additional details

Funding