Utilising ANNIS for search and analysis of historical data

doi:10.5281/zenodo.160368

Published September 12, 2016 | Version v1

Presentation Open

Utilising ANNIS for search and analysis of historical data

1. Humboldt-Universität Berlin

Tools for the analysis of historical data, especially from non-Indo-European languages, have to solve specific challenges pertaining to, e.g., the synchronised representation of original script and transliterations, deep search over non-Latin script, data models to allow for customised tokenisations, etc. While the implementation of new software solutions for a specific research question and specific data in this context is a plausible solution, it is perfectly unsustainable. We present ANNIS, a browser-based, re-usable search and analysis tool for multi-layer linguistic corpora. ANNIS can be, and has been, used to searches and analyses over a number of historical corpora as well as corpora with non-Latin script. It is driven by a graph-based data model that is able to take up potentially unlimited types of annotation, and can therefore be used to represent data coming from various different sources and formats. The possibility of conversion from several different formats via the compatible conversion framework Pepper makes ANNIS highly re-usable in a wide variety of research contexts. It also features different, pluggable, visualisation options so that the different corpus strata can be presented in optimal form. We exemplarily present a use case for search in the Coptic SCRIPTORIUM, a multi-layer corpus of Coptic.

Files

annishist_druskat-krause-odebrecht.pdf

Files (421.7 kB)

Name	Size	Download all
annishist_druskat-krause-odebrecht.pdf md5:e636172cc2b8b9fc1d54148de9a55430	421.7 kB	Preview Download

	All versions	This version
Views	115	113
Downloads	46	46
Data volume	19.8 MB	19.8 MB

Utilising ANNIS for search and analysis of historical data

Creators

Description

Files

annishist_druskat-krause-odebrecht.pdf

Files (421.7 kB)