Published January 11, 2021 | Version 1.0.0
Dataset Open

Fon French Daily Dialogues Parallel Data

  • 1. Jacobs University Bremen
  • 2. IamYourClounon
  • 3. Afro.Num
  • 4. Technical University of Munich

Description

We aim to collect, clean, and store corpora of Fon and French sentences for Natural Language Processing researches including Neural Machine Translation, Named Entity Recognition, etc. for Fon, a very low-resourced and endangered African native language.

Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, Togo, and Nigeria - by about 2 million people.

As training data is crucial to the high performance of a machine learning model, the aim of this project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon.

Through crowdsourcing, Google Form Surveys, we gathered and cleaned #25377 parallel Fon-French# all based on daily conversations.

To the crowdsourcing, creation, and cleaning of this version have contributed:

1) Name: Bonaventure DOSSOU
Affiliation: MSc Student in Data Engineering, Jacobs University
Contact: femipancrace.dossou@gmail.com

2) Name: Ricardo AHOUNVLAME
Affiliation: Student in Linguistics
Contact: tontonjars@gmail.com

3) Name: Fabroni YOCLOUNON
Affiliation: Creator of the Label IamYourClounon
Contact: iamyourclounon@gmail.com

4) Name: BeninLangues
Affiliation: BeninLangues
Contact: https://beninlangues.com/

5) Name: Chris Emezue
Affiliation: MSc Student in Mathematics in Data Science, Technical University of Munich
Contact: chris.emezue@gmail.com

_______________________________________________________

To join as a contributor, please contact us at:
  1) https://twitter.com/bonadossou
  2) https://twitter.com/ChrisEmezue
  3) https://twitter.com/edAIOfficial
Or contact Bonaventure Dossou (femipancrace.dossou@gmail.com), Chris Emezue (chris.emezue@gmail.com)
_______________________________________________________

Clavier Fongbé (WebView): https://bonaventuredossou.github.io/clavierfongbe/ (Made by Bonaventure Dossou)
Clavier Fongbé (Mobile Android Version): https://play.google.com/store/apps/details?id=com.fulbertodev.clavierfongbe&hl=en&gl=US (Fabroni Yoclounon, Bonventure Dossou et. al.)

Notes

Please cite the following papers:

@article{2103.08052, Author = {Bonaventure F. P. Dossou and Chris C. Emezue}, Title = {Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language}, Year = {2021}, Eprint = {arXiv:2103.08052}, Howpublished = {African NLP, EACL 2021}}

@inproceedings{emezue-dossou-2020-ffr, title = "{FFR} v1.1: {F}on-{F}rench Neural Machine Translation", author = " Dossou, Femi Pancrace Bonaventure and Emezue, Chris Chinenye", booktitle = "Proceedings of the The Fourth Widening Natural Language Processing Workshop", month = jul, year = "2020", address = "Seattle, USA", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.winlp-1.21", doi = "10.18653/v1/2020.winlp-1.21", pages = "83--87"}

Files

Fon_French_Parallel_Data_25377.csv

Files (1.4 MB)

Name Size Download all
md5:d54881872d97929c2349503398a05a6d
1.4 MB Preview Download

Additional details

Related works

Is source of
10.5281/zenodo.4432712 (DOI)