Web-Scraped Nigerian Pidgin English Text Dataset from Digital News Platforms

Nwogo, Everistus Ugochukwu; Ihianle, Isibor Kennedy; Machado, Pedro; Bird, Jordan J.; Lotfi, Ahmad; Shuaib, Ahmad Abdulnasir; Akinwumi, Isaac; Jonathan, Oluranti

doi:10.5281/zenodo.18467116

There is a newer version of the record available.

Published February 3, 2026 | Version v1

Dataset Open

Web-Scraped Nigerian Pidgin English Text Dataset from Digital News Platforms

1. Nottingham Trent University
2. Covenant University

This dataset consists of Nigerian Pidgin English text collected through web scraping of multiple Nigerian Pidgin news and media websites, capturing a wide range of contemporary topics, linguistic styles, and sociocultural expressions. The corpus was cleaned, normalised, and curated to ensure linguistic consistency and usability for downstream natural language processing tasks. It was used to fine-tune and evaluate quantised Large Language Models, enabling analysis of performance–efficiency trade-offs in low-resource deployment scenarios. The dataset is designed to support research and development of robust Nigerian Pidgin English language models for multilingual NLP, low-resource language modelling, and culturally grounded AI applications.

Files

An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study.csv

Files (59.3 MB)

Name	Size	Download all
An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study.csv md5:6fc346a99d40a6ccf4477d9df8081bab	59.3 MB	Preview Download

Additional details

British Council
STEM 4.0: Advancing Technology Education Through AI-Driven And Adaptive Learning TNE2024-057

Repository URL: https://huggingface.co/datasets/Guavacoderepo/gclm-pidgin-text-corpus
Development Status: Active

Views

Downloads

Show more details

	All versions	This version
Views	57	51
Downloads	41	39
Data volume	3.9 GB	3.8 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Published in

An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study, 2026.

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: February 3, 2026
Modified: February 4, 2026

An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study.csv

Files (59.3 MB)

Funding

Software

Web-Scraped Nigerian Pidgin English Text Dataset from Digital News Platforms

Authors/Creators

Description

Files

An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study.csv

Files (59.3 MB)

Additional details

Funding

Software