Published December 1, 2020 | Version 1.0.0
Conference paper Open

WordNets for South African Languages

  • 1. Council for Scientific and Industrial Research
  • 2. University of Limpopo
  • 3. University of Pretoria

Description

Data statement of the WordNets for South Africa languages

 

Data set name: WordNets for South Africa languages

Citation: Sefara, T.J., Mokgonyane, T.B. and Marivate, V., 2021. Practical Approach on Implementation of WordNets for South African Languages. In Proceedings of the Eleventh Global Wordnet Conference.

 

Data set developer(s): Sefara, T.J. (https://speechtech.co.za), Mokgonyane, T.B. (https://sites.google.com/view/tumisho-mokgonyane) and Marivate, V. (https://vima.co.za)

Data statement authors: Sefara, T.J. (https://speechtech.co.za), Mokgonyane, T.B. (https://sites.google.com/view/tumisho-mokgonyane) and Marivate, V. (https://vima.co.za)

Link to the dataset: zenodo link here

Dataset license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

 

A. CURATION RATIONALE

The dataset of the WordNets for South Africa languages has been modified to be compatible with OMW in NLTK. The dataset contains Wordnets of Setswana, Sepedi, Tshivenda, isiZulu and isXhosa. Originally the datasets was created for WordNet 2.0. Now the dataset is converted to WordNet 3.0 using the sensemap files from Princeton WordNets.

B. LANGUAGE VARIETY/VARIETIES

The language of the datasets are Standard ISO639-2:

  • Sepedi (nso)

  • Setswana (tsn)

  • isiXhosa (xho),

  • isiZulu (zul)

  • Tshivenda (ven)

 

C. SPEAKER DEMOGRAPHIC

N/A

 

D. ANNOTATOR DEMOGRAPHIC

N/A

 

E. SPEECH SITUATION

N/A

 

F. TEXT CHARACTERISTICS

N/A

 

G. RECORDING QUALITY

N/A

 

H. OTHER

We provide a link to the library that utilise this dataset: https://github.com/JosephSefara/AfricanWordNet

 

I. PROVENANCE APPENDIX

N/A

 

About this document

A data statement is a characterisation of a dataset that provides context to allow developers and users to better understand how experimental results might generalise, how software might be appropriately deployed, and what biases might be reflected in systems built on the software.

Notes

Initially published by: Sonja E Bosch and Marissa Griesel. 2017. Strategies for building WordNets for under-resourced languages: The case of African languages. Literator (Potchefstroom. Online), 38(1):1–12.

Files

africanwordnet.zip

Files (473.3 kB)

Name Size Download all
md5:d84345dcebfb7be12fb4be78bb25b03f
473.3 kB Preview Download