With the rising importance of empirical data in many fields of linguistic research, we see an
increase not only in the amount of electronically available corpora, but also in the number of
tools used to make this data accessible, processable and searchable. Most of these tools have
been developed in the course of specific linguistic projects and therefore can only handle a
certain kinds of linguistic information, such as syntactic-structures (e.g. TIGERSearch, Lezius
2002), or dialogue-structures (e.g. EXMARaLDA, Schmidt 2004) etc. At the same time, each
tool uses its own, proprietary format for representing the text and its annotations. Such
formats are optimized for a specific kind of analysis and the performance of a specific
processing tool. Consequently they cannot easily be mapped onto each other. This impedes
those linguistic research questions which pre-suppose a global view on data, i.e., which
require the option to correlate, query and analyze several kinds of linguistic annotations at
We present Pepper, a modularized converter framework addressing the problem that a
linguistic researcher may be limited to a small set of questions due to the tool(s) he or she
uses. Pepper is based on the meta-model Salt (Zipser & Romary 2010) and offers the
possibility of converting data from n formats into m formats with a minimal number of
necessary mappings. The pluggable architecture of Pepper allows the injection of new formats
into the framework. Pepper has no restrictions on the underlying techniques used in
representing these formats (e.g. XML, tabular-formats, bracketing-formats or mixtures

