Published June 30, 2023 | Version v1
Conference paper Open

Developing a Pipeline for Automatic Linguistic Analysis of Historical Manuscripts and Early Printings: The Pre-Modern Slavic Case

  • 1. University of Freiburg, Germany
  • 2. Bavarian Academy of Sciences and Humanities, Germany
  • 3. University of Kragujevac, Serbia
  • 1. University of Graz
  • 2. Belgrade Center for Digital Humanities
  • 3. Le Mans Université
  • 4. Digital Humanities im deutschsprachigen Raum

Description

We report on experiments with Handwritten Text Recognition models to automatically create large pre-modern Slavic text corpora and to use these corpora without manual post-correction (as raw data and with uncorrected POS tags) for quantitative linguistic analysis (inferential statistics, stylometry); we evaluate the actual noise in the data.

Files

RABUS_Achim_Developing_a_Pipeline_for_Automatic_Linguistic_A.pdf

Additional details

Related works

Is part of
Book: 10.5281/zenodo.7961822 (DOI)