Poster Open Access

Atomic: An open-source software platform for multi-level corpus annotation

Druskat, Stephan; Bierkandt, Lennart; Gast, Volker; Rzymski, Christoph; Zipser, Florian

Poster presented at 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014). Hildesheim, 9 October 2014.

This paper presents Atomic, an open-source platform-independent desktop application for multi-level corpus annotation. Atomic aims at providing the linguistic community with a user-friendly annotation
tool and sustainable platform through its focus on extensibility, a generic data model, and compatibility with existing linguistic formats. It is implemented on top of the Eclipse Rich Client Platform, a pluggable
Java-based framework for creating client applications. Atomic - as a set of plugins for this framework - integrates with the platform and allows other researchers to develop and integrate further extensions to the
software as needed. The generic graph.based meta model Salt serves as Atomic’s domain model and allows for unlimited annotation levels and types. Salt is also used as an intermediate model in the Pepper framework for conversion of linguistic data, which is fully integrated into Atomic, making the latter compatible with a wide range of linguistic formats. Atomic provides tools for both less experienced and expert annotators: graphical, mouse-driven editors and a command-line data manipulation language for rapid annotation.

Files (1.0 MB)
Name Size
druskat_et_al_poster_final.pdf md5:1839eae38e4d303b54213cd55faab1dc 1.0 MB Download
  • McAffer, Jeff; Lemieux, Jean-Michel and Aniszczyk, Chris. 2010. Eclipse Rich Client Platform. 2nd edn. Addison-Wesley, Boston.
  • Zipser, Florian and Romary, Laurent. 2010. A model oriented approach to the mapping of annotation formats using standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010, Malta. •
  • Zipser, Florian; Zeldes, Amir; Ritz, Julia; Romary, Laurent; Leser, Ulf. 2011. Pepper: Handling a multiverse of formats. In: 33. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Göttingen, Feb 2011.


Cite as