SkELL corpora as a part of the language portal Sõnaveeb: problems and perspectives

Koppel, Kristina; Kallas, Jelena; Khokhlova, Maria; Suchomel, Vít; Baisa, Vít; Michelfeit, Jan

doi:10.5281/zenodo.3612933

Published October 1, 2019 | Version v1

Conference paper Open

SkELL corpora as a part of the language portal Sõnaveeb: problems and perspectives

1. Institute of the Estonian Language
2. St. Petersburg State University
3. Lexical Computing Ltd.

In this paper we analyse the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal.

This paper describes parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian (GDEX 1.4.) and for Russian (GDEX 1.2) used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, gives an overview of an evaluation of the GDEX 1.4. configuration for Estonian, and outlines the requirements for user-friendly SkELL corpora presentation as a part of the language portal.

Files

eLex_2019_SkELL corpora as a part of the language portal Sõnaveeb.pdf

Files (723.0 kB)

Name	Size	Download all
eLex_2019_SkELL corpora as a part of the language portal Sõnaveeb.pdf md5:7b77be2eb226ba3270001e2ac3844da7	723.0 kB	Preview Download

Additional details

ELEXIS – European Lexicographic Infrastructure 731015: European Commission

	All versions	This version
Views	123	123
Downloads	60	60
Data volume	49.2 MB	49.2 MB

SkELL corpora as a part of the language portal Sõnaveeb: problems and perspectives

Creators

Description

Files

eLex_2019_SkELL corpora as a part of the language portal Sõnaveeb.pdf

Files (723.0 kB)

Additional details

Funding