Published November 11, 2022 | Version 01

Corpus of English and Nigerian Pidgin Code-switching (CENCOS)

  • 1. Heinrich-Heine University, Düsseldorf

Description

This dataset was compiled from a fieldwork in Nigeria in 2019. It features naturally occurring spoken conversations from educated speakers of English and Nigerian Pidgin in Nigeria, with very few conversations involving uneducated speakers. Nigeria is a multi-lingual nation with over 500 languages that are not mutually intelligible. English and Nigerian Pidgin serve as lingua francas used to bridge linguistic gaps between speakers whose languages are mutually unintelligible. English is used in both formal and informal settings, but Nigerian Pidgin is used only in informal settings. Nigerian Pidgin was formerly regarded as the language of the uneducated in Nigeria.  Over time, it has developed into a language spoken not only by the uneducated, but also by the educated in Nigeria. The compilation of this corpus is an effort to understand how the educated speakers with the knowledge of both languages are able to use them in interactions. This corpus contains both sound and text files, but the sound files are not included here for data protection reasons. The sound files are manually transcribed into texts, amounting to over 100, 000 word tokens.  It contains no annotation other than the speakers. An excel sheet containing speakers’ basic information like gender, age, ethnic group and education status is included. With these social factors, this corpus is useful for any form of investigation on the use of English and Nigerian Pidgin in Nigeria.  

Files

CENCOS corpus.zip

Files (190.7 kB)

Name Size Download all
md5:ba26dcbd17d17909439de83808c13bd1
175.0 kB Preview Download
md5:de181b40b579e21337433e8e7ada1568
15.8 kB Download