Published January 11, 2021 | Version 1.0.0
Dataset Open

Fon French Daily Dialogues Parallel Data

  • 1. Jacobs University Bremen
  • 2. IamYourClounon
  • 3. Afro.Num
  • 4. Technical University of Munich


We aim to collect, clean, and store corpora of Fon and French sentences for Natural Language Processing researches including Neural Machine Translation, Named Entity Recognition, etc. for Fon, a very low-resourced and endangered African native language.

Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, Togo, and Nigeria - by about 2 million people.

As training data is crucial to the high performance of a machine learning model, the aim of this project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon.

Through crowdsourcing, Google Form Surveys, we gathered and cleaned #25377 parallel Fon-French# all based on daily conversations.

To the crowdsourcing, creation, and cleaning of this version have contributed:

1) Name: Bonaventure DOSSOU
Affiliation: MSc Student in Data Engineering, Jacobs University

2) Name: Ricardo AHOUNVLAME
Affiliation: Student in Linguistics

3) Name: Fabroni YOCLOUNON
Affiliation: Creator of the Label IamYourClounon

4) Name: BeninLangues
Affiliation: BeninLangues

5) Name: Chris Emezue
Affiliation: MSc Student in Mathematics in Data Science, Technical University of Munich


To join as a contributor, please contact us at:
Or contact Bonaventure Dossou (, Chris Emezue (

Clavier Fongbé (WebView): (Made by Bonaventure Dossou)
Clavier Fongbé (Mobile Android Version): (Fabroni Yoclounon, Bonventure Dossou et. al.)


Please cite the following papers:

@article{2103.08052, Author = {Bonaventure F. P. Dossou and Chris C. Emezue}, Title = {Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language}, Year = {2021}, Eprint = {arXiv:2103.08052}, Howpublished = {African NLP, EACL 2021}}

@inproceedings{emezue-dossou-2020-ffr, title = "{FFR} v1.1: {F}on-{F}rench Neural Machine Translation", author = " Dossou, Femi Pancrace Bonaventure and Emezue, Chris Chinenye", booktitle = "Proceedings of the The Fourth Widening Natural Language Processing Workshop", month = jul, year = "2020", address = "Seattle, USA", publisher = "Association for Computational Linguistics", url = "", doi = "10.18653/v1/2020.winlp-1.21", pages = "83--87"}



Files (1.4 MB)

Name Size Download all
1.4 MB Preview Download

Additional details

Related works

Is source of
10.5281/zenodo.4432712 (DOI)