Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

Abromeit, Frank; Chiarcos, Christian

doi:10.4230/OASIcs.LDK.2019.23

Published May 20, 2019 | Version v1

Conference paper Open

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

1. Goethe University Frankfurt

We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3.

Notes

The research described in this paper was conducted in the context of the Specialized Information Service Linguistics, funded by German Research Foundation (DFG/LIS, 2017-2019). The contributions of the second author were conducted with additional support from the Horizon 2020 Research and Innovation Action "Pret-a-LLOD. Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors" (H2020-ICT-2018-2, 2019-2021).

Files

OASIcs-LDK-2019-23.pdf

Files (489.8 kB)

Name	Size	Download all
OASIcs-LDK-2019-23.pdf md5:3e58201998d64c734c27e817bd782f18	489.8 kB	Preview Download

Additional details

URN: urn:nbn:de:0030-drops-103873
URL: http://drops.dagstuhl.de/opus/volltexte/2019/10387/

European Commission
Pret-a-LLOD - Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors 825182

	All versions	This version
Views	116	116
Downloads	94	94
Data volume	47.5 MB	47.5 MB

OASIcs-LDK-2019-23.pdf

Files (489.8 kB)

Identifiers

Funding

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

Authors/Creators

Description

Notes

Files

OASIcs-LDK-2019-23.pdf

Files (489.8 kB)

Additional details

Identifiers

Funding