Published May 11, 2022 | Version 1.0
Dataset Open

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

  • 1. Department of Software Engineering, Faculty of Computing, Bayero University Kano, 700241 Kano, Nigeria
  • 2. Department of Information Technology, Faculty of Computing, Bayero University Kano, 700241 Kano, Nigeria
  • 3. Department of Computer Science, Faculty of Computing, Bayero University Kano, 700241 Kano, Nigeria
  • 4. Department of Computer Science, Ahmadu Bello University Zaria, Kaduna, Nigeria

Description

We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria—Hausa, Igbo, Nigerian-Pidgin, and Yorùbá—consisting of around 30,000 annotated tweets per language (except for Nigerian-Pidgin), including a significant fraction of code-mixed tweets.

Notes

This work was carried out with support from Lacuna Fund, an initiative co-founded by The Rockefeller Foundation, Google.org, and Canada's International Development Research Centre. The views expressed herein do not necessarily represent those of Lacuna Fund, its Steering Committee, its funders, or Meridian Institute. We thank Tal Perry for providing the LightTag annotation tool.

Files

isahmadbbr/NaijaSenti-1.0.zip

Files (16.3 MB)

Name Size Download all
md5:58580671facf2172fbfdccf1fcfe8b4d
16.3 MB Preview Download

Additional details

Related works