COMPRISE_Data13_Y-BBCTopics_V1.0

Adelani, David; Hedderich, Michael

doi:10.5281/zenodo.5998515

Published September 30, 2020 | Version V1.0

Dataset Open

COMPRISE_Data13_Y-BBCTopics_V1.0

1. Universität des Saarlandes

The dataset contains the Yoruba data used to evaluate the performance of different weakly supervised learning techniques for text classification in low-resourced languages. The text was collected from the BBC news website for Yoruba and annotated by two native speakers. The data has seven categories based on the news headlines: Sports, Entertainment, Nigeria, Africa, World, Health, and Politics. It contains 1,908 sentences.

Files

COMPRISE_Data13_Y-BBCTopics_V1.0.zip

Files (340.0 kB)

Name	Size	Download all
COMPRISE_Data13_Y-BBCTopics_V1.0.zip md5:8396e18bac885891019ff8c8207b884d	340.0 kB	Preview Download

Additional details

European Commission
COMPRISE - Cost-effective, Multilingual, Privacy-driven voice-enabled Services 825081

203

Views

Downloads

Show more details

	All versions	This version
Views	203	203
Downloads	20	20
Data volume	7.8 MB	7.8 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Yoruba

License: Creative Commons Attribution Non Commercial 2.0 Generic

No further description. Read more

Technical metadata

Created: February 8, 2022
Modified: February 11, 2022

COMPRISE_Data13_Y-BBCTopics_V1.0

Creators

Description

Files

COMPRISE_Data13_Y-BBCTopics_V1.0.zip

Files (340.0 kB)

Additional details

Funding