Published March 5, 2009 | Version v1
Poster Open

Search and Visualization of Richly Annotated Corpora with ANNIS2

Description

This poster presents the latest version of ANNIS2, a web browser-based search and
visualization environment designed to access richly annotated corpora with heterogeneous
annotation schemes. Developed within Collaborative Research Centre 632 (SFB 632:
“Information Structure: The Linguistic Means for Structuring Utterances, Sentences and
Texts”), ANNIS (ANNotation of Information Structure) must meet the requirements imposed
by diverse data from partner projects within the Research Centre and beyond.
Since information structure interacts with linguistic phenomena on many levels, the need to
concurrently query and visualize data annotated for syntax, semantics, morphology, prosody,
phonetics, referentiality and lexis, must be addressed, including where the data is multimodal.
For this reason, ANNIS2 supports annotations of tokens, token spans and trees or other DAGs
(directed acyclic graphs), and uses an appropriate query language capable of searching these
structures. Both query language and visualizations are fully Unicode compatible to ensure
support for a wide variety of non-European languages.
The underlying data for the system is annotated using both automatic taggers/parsers and a
small set of manual annotation tools: EXMARaLDA (Schmidt 2004), annotate (Brants &
Plaehn 2000) / Synpathy (www.lat-mpi.eu/tools/synpathy/), MMAX2 (Müller & Strube
2006), RSTTool (O’Donnell 2000) and PALinkA (Orasan 2003). These are then mapped onto
the encoding standard of the SFB, PAULA (Potsdamer AUstauschformat für Linguistische
Annotation / Potsdam Interchange Format for Linguistic Annotation), a stand-off multilevel
XML format, which serves as the basis for further processing. The XML data is compiled into
a relational database scheme, making the system’s backend particularly scalable.

Files

DGfS2009_annis_poster.pdf

Files (573.9 kB)

Name Size Download all
md5:812204301da06d6a15c22ef81bbb5249
573.9 kB Preview Download