Web-Scraped Nigerian Pidgin English Text Dataset from Digital News Platforms
Authors/Creators
Description
This dataset consists of Nigerian Pidgin English text collected through web scraping of multiple Nigerian Pidgin news and media websites, capturing a wide range of contemporary topics, linguistic styles, and sociocultural expressions. The corpus was cleaned, normalised, and curated to ensure linguistic consistency and usability for downstream natural language processing tasks. It was used to fine-tune and evaluate quantised Large Language Models, enabling analysis of performance–efficiency trade-offs in low-resource deployment scenarios. The dataset is designed to support research and development of robust Nigerian Pidgin English language models for multilingual NLP, low-resource language modelling, and culturally grounded AI applications.
Files
An AI-Enhanced Adaptive Learning Platform for Multilingual and Low-Resource Educational Contexts: A Nigerian Case Study.csv
Files
(59.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6fc346a99d40a6ccf4477d9df8081bab
|
59.3 MB | Preview Download |
Additional details
Funding
- British Council
- STEM 4.0: Advancing Technology Education Through AI-Driven And Adaptive Learning TNE2024-057
Software
- Repository URL
- https://huggingface.co/datasets/Guavacoderepo/gclm-pidgin-text-corpus
- Development Status
- Active