Amharic WSD Dataset: Advancing Word Sense Disambiguation in Amharic

Yigzaw, Robbel Habtamu; Assefa, Beakal Gizachew; Belay, Elefelious Getachew

doi:10.5281/zenodo.13992003

Published October 25, 2024 | Version v1

Dataset Restricted

Amharic WSD Dataset: Advancing Word Sense Disambiguation in Amharic

1. Addis Ababa Institute of Technology(AAiT)

This dataset is specifically designed for the Word Sense Disambiguation (WSD) task in the Amharic language, consisting of 50,415 annotated sentences. Each sentence includes the correct sense for one of 200 ambiguous words chosen based on homonymy relations, where a single word may have multiple meanings depending on its context.

The ambiguous words were selected to capture the nuances of Amharic vocabulary, drawing from diverse textual sources such as news articles, literature, and social media. This ensures a broad and representative range of usage across various contexts, making the dataset particularly valuable for advancing Amharic NLP research. Potential applications include improvements in machine translation, sentiment analysis, and other semantic processing tasks in Amharic.

The dataset is organized in a structured format, with each entry containing fields for sentence, ambiguous word, sense, gloss, and sense label, facilitating ease of use for machine learning models.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/13992003">Log in</a> to check if you have access.

	All versions	This version
Views	160	160
Downloads	24	24
Data volume	1.1 GB	1.1 GB

Amharic WSD Dataset: Advancing Word Sense Disambiguation in Amharic

Authors/Creators

Description

Files

Restricted