Presentation Open Access

Utilising ANNIS for search and analysis of historical data

Druskat, Stephan; Krause, Thomas; Odebrecht, Carolin

Tools for the analysis of historical data, especially from non-Indo-European languages, have to solve specific challenges pertaining to, e.g., the synchronised representation of original script and transliterations, deep search over non-Latin script, data models to allow for customised tokenisations, etc. While the implementation of new software solutions for a specific research question and specific data in this context is a plausible solution, it is perfectly unsustainable. We present ANNIS, a browser-based, re-usable search and analysis tool for multi-layer linguistic corpora. ANNIS can be, and has been, used to searches and analyses over a number of historical corpora as well as corpora with non-Latin script. It is driven by a graph-based data model that is able to take up potentially unlimited types of annotation, and can therefore be used to represent data coming from various different sources and formats. The possibility of conversion from several different formats via the compatible conversion framework Pepper makes ANNIS highly re-usable in a wide variety of research contexts. It also features different, pluggable, visualisation options so that the different corpus strata can be presented in optimal form. We exemplarily present a use case for search in the Coptic SCRIPTORIUM, a multi-layer corpus of Coptic.

Files (421.7 kB)
Name Size
annishist_druskat-krause-odebrecht.pdf
md5:e636172cc2b8b9fc1d54148de9a55430
421.7 kB Download
13
5
views
downloads
All versions This version
Views 1313
Downloads 55
Data volume 2.1 MB2.1 MB
Unique views 1111
Unique downloads 44

Share

Cite as