Published September 12, 2016 | Version v1
Presentation Open

Utilising ANNIS for search and analysis of historical data

  • 1. Humboldt-Universität Berlin

Description

Tools for the analysis of historical data, especially from non-Indo-European languages, have to solve specific challenges pertaining to, e.g., the synchronised representation of original script and transliterations, deep search over non-Latin script, data models to allow for customised tokenisations, etc. While the implementation of new software solutions for a specific research question and specific data in this context is a plausible solution, it is perfectly unsustainable. We present ANNIS, a browser-based, re-usable search and analysis tool for multi-layer linguistic corpora. ANNIS can be, and has been, used to searches and analyses over a number of historical corpora as well as corpora with non-Latin script. It is driven by a graph-based data model that is able to take up potentially unlimited types of annotation, and can therefore be used to represent data coming from various different sources and formats. The possibility of conversion from several different formats via the compatible conversion framework Pepper makes ANNIS highly re-usable in a wide variety of research contexts. It also features different, pluggable, visualisation options so that the different corpus strata can be presented in optimal form. We exemplarily present a use case for search in the Coptic SCRIPTORIUM, a multi-layer corpus of Coptic.

Files

annishist_druskat-krause-odebrecht.pdf

Files (421.7 kB)

Name Size Download all
md5:e636172cc2b8b9fc1d54148de9a55430
421.7 kB Preview Download