Published April 14, 2016 | Version v1

Making Europe's Historical Newspapers Searchable

  • 1. Staatsbibliothek zu Berlin - Preußischer Kulturbesitz
  • 2. University of Salford, Greater Manchester

Description

This poster paper provides a rare glimpse into the overall approach for the refinement of historical newspapers with text and layout recognition in the Europeana Newspapers project. Within three years, the project processed more than 10 million pages of historical newspapers from 12 national and major libraries to produce the largest open access and fully searchable text collection of digital historical newspapers in Europe. In this, a wide variety of legal, logistical, technical and other challenges were encountered. After introducing the background issues in newspaper digitization in Europe, the paper discusses the technical refinement workflow in greater detail. It explains what decisions were taken in the design of the large-scale processing workflow to address these challenges, what were the results produced and what has been identified as best practices.

Notes

The full paper is available here: http://www.primaresearch.org/publications/DAS2016_Neudecker_HistoricalNewspapers

Files

DAS2016_119_v0.3.pdf

Files (1.7 MB)

Name Size Download all
md5:5bd6344452430c1665a005e2d414d061
1.7 MB Preview Download