ANNIS, SaltNPepper & PAULA: A multilayer corpus infrastructure

doi:10.5281/zenodo.20713

Published May 24, 2015 | Version v1

Poster Open

ANNIS, SaltNPepper & PAULA: A multilayer corpus infrastructure

1. Humboldt-Universität zu Berlin
2. Universität Potsdam
3. Georgetown University

Information structure, like many other linguistic phenomena, influences different linguistic levels at the same time (stress, word order, definiteness, etc.). PAULA is a human and machine-readable XML format to store linguistic data which are annotated on multiple layers. Corpus-based research on information structure therefore needs access to different types of annotation (Lüdeling et al., to appear). There are now many multi-layer corpora with annotations of linguistic phenomena on several levels (see, e.g. Tüba-D/Z
(Telljohann et al. 2009), Falko (Reznicek et al. 2012) or PCC (Stede
& Neumann 2014)). Unfortunately most tools have different formats which may not be interoperable, that means data can hardly be exchange between tools. Furthermore, there is no possiblity for analysis on multiple layers.

Goals:
1.Merging different types of annotations of the same primary text to a single corpus → Pepper
2.Storage of different types of annotations in only one format → PAULA
3.Search in different corpora and different phenomena in one single system → ANNIS

Files

sfbConference2015_ZipserKrauseLudelingNeumannStedeZeldes.pdf

Files (770.6 kB)

Name	Size	Download all
sfbConference2015_ZipserKrauseLudelingNeumannStedeZeldes.pdf md5:4ce8fcf09c5135fb4b4fe3d0d2eaf29d	770.6 kB	Preview Download

Additional details

CLARIN – Common Language Resources and Technology Infrastructure 212230: European Commission

	All versions	This version
Views	289	288
Downloads	133	133
Data volume	107.9 MB	107.9 MB

ANNIS, SaltNPepper & PAULA: A multilayer corpus infrastructure

Creators

Description

Files

sfbConference2015_ZipserKrauseLudelingNeumannStedeZeldes.pdf

Files (770.6 kB)

Additional details

Funding