Published 2024 | Version v2
Journal Open

Normalization of Arabic Dialects into Modern Standard Arabic using BERT and GPT-2

  • 1. Rootroo Ltd
  • 2. ROR icon Helsinki Metropolia University of Applied Sciences

Description

We present an encoder-decored based model for normalization of Arabic dialects using both BERT and GPT-2 based models. Arabic is a language of many dialects that not only differ from the Modern Standard Arabic (MSA) in terms of pronunciation but also in terms of morphology, grammar and lexical choice. This diversity can be troublesome even to a native Arabic speaker let alone a computer. Several NLP tools work well for MSA and in some of the main dialects but fail to cover Arabic language as a whole. Based on our manual evaluation, our model normalizes sentences entirely correctly 46\% of the time and almost correctly 26\% of the time.

Files

Arabic_normalization_1_.pdf

Files (213.3 kB)

Name Size Download all
md5:63424c53345dd95915e76f2c861b063c
213.3 kB Preview Download

Additional details

Identifiers

ISSN
2416-5999

Dates

Accepted
2024