There is a newer version of the record available.

Published February 3, 2026 | Version v1
Dataset Open

Web-Scraped Nigerian Pidgin English Text Dataset from Digital News Platforms

Description

This dataset consists of Nigerian Pidgin English text collected through web scraping of multiple Nigerian Pidgin news and media websites, capturing a wide range of contemporary topics, linguistic styles, and sociocultural expressions. The corpus was cleaned, normalised, and curated to ensure linguistic consistency and usability for downstream natural language processing tasks. It was used to fine-tune and evaluate quantised Large Language Models, enabling analysis of performance–efficiency trade-offs in low-resource deployment scenarios. The dataset is designed to support research and development of robust Nigerian Pidgin English language models for multilingual NLP, low-resource language modelling, and culturally grounded AI applications.

Files

An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study.csv

Additional details

Funding

British Council
STEM 4.0: Advancing Technology Education Through AI-Driven And Adaptive Learning TNE2024-057