Published October 1, 2019 | Version v1
Conference paper Open

SkELL corpora as a part of the language portal Sõnaveeb: problems and perspectives

  • 1. Institute of the Estonian Language
  • 2. St. Petersburg State University
  • 3. Lexical Computing Ltd.

Description

In this paper we analyse the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal.

This paper describes parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian (GDEX 1.4.) and for Russian (GDEX 1.2) used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, gives an overview of an evaluation of the GDEX 1.4. configuration for Estonian, and outlines the requirements for user-friendly SkELL corpora presentation as a part of the language portal.

Files

eLex_2019_SkELL corpora as a part of the language portal Sõnaveeb.pdf

Files (723.0 kB)

Additional details

Funding

ELEXIS – European Lexicographic Infrastructure 731015
European Commission