Published April 7, 2015 | Version v1
Poster Open

Interactive analysis of multi-layer linguistic corpora with ANNIS

  • 1. Humboldt-Universität zu Berlin
  • 2. Universität Potsdam

Description

In this poster, we present new features of the ANNIS 1 search- and visualization system for
multi-layered corpora. ANNIS was developed as a platform to explore a wide range of corpora. It is
not limited to a specific type of annotation or a single corpus. Instead, ANNIS abstracts the
linguistic data and interprets them as a graph (with nodes representing tokens or structural elements
like phrases and edges symbolizing the relationships between them). This abstraction allows
ANNIS to use the same query language for vastly different corpora. Furthermore, ANNIS comes
with a set of different visualizations to display corpus specific annotation layers like syntactic trees,
coreference chains, rhetorical structure trees and many more. To give the user a familiar feeling
when searching through the data, these visualizations are very close to the ones used in the original
annotation tools. The particular power of ANNIS is the combined search and visualization of
several annotation layers at a time. This empowers linguists to search comprehensively for
phenomena on several layers, which becomes important with the increasing number of
multi-layered corpora like TüBa-D/Z 2 , PCC 3 or the Falko 4 corpus.
In addition to performance improvements, the latest ANNIS version features a new frequency
analysis module and improvements of the ANNIS query language AQL. The new module allows
to calculate frequencies of annotations without resorting to external tools. AQL now supports
queries with logical alternatives and the possibility to search for multiple annotations with the same
or different values. Furthermore, we simplified the syntax of AQL to make complex queries easier
to read and write. The frequency module, in combination with these AQL improvements simplifies
the interactive analysis of linguistic corpora and allows a wider range of linguistic analysis directly
in ANNIS. We will show these improvements along with the revised multi-layered PCC 2.0 corpus
at the live demonstration.

Files

DGFS2015_ANNIS_ZipserKrauseNeumann.pdf

Files (773.8 kB)

Name Size Download all
md5:896ae1e37cd57375081db5c79b8191d8
773.8 kB Preview Download