Utilising ANNIS for search and analysis of historical data
Description
Tools for the analysis of historical data, especially from non-Indo-European languages, have to solve specific challenges pertaining to, e.g., the synchronised representation of original script and transliterations, deep search over non-Latin script, data models to allow for customised tokenisations, etc. While the implementation of new software solutions for a specific research question and specific data in this context is a plausible solution, it is perfectly unsustainable. We present ANNIS, a browser-based, re-usable search and analysis tool for multi-layer linguistic corpora. ANNIS can be, and has been, used to searches and analyses over a number of historical corpora as well as corpora with non-Latin script. It is driven by a graph-based data model that is able to take up potentially unlimited types of annotation, and can therefore be used to represent data coming from various different sources and formats. The possibility of conversion from several different formats via the compatible conversion framework Pepper makes ANNIS highly re-usable in a wide variety of research contexts. It also features different, pluggable, visualisation options so that the different corpus strata can be presented in optimal form. We exemplarily present a use case for search in the Coptic SCRIPTORIUM, a multi-layer corpus of Coptic.
Files
annishist_druskat-krause-odebrecht.pdf
Files
(421.7 kB)
Name | Size | Download all |
---|---|---|
md5:e636172cc2b8b9fc1d54148de9a55430
|
421.7 kB | Preview Download |