Published 2024
| Version v2
Journal
Open
Normalization of Arabic Dialects into Modern Standard Arabic using BERT and GPT-2
Creators
Description
We present an encoder-decored based model for normalization of Arabic dialects using both BERT and GPT-2 based models. Arabic is a language of many dialects that not only differ from the Modern Standard Arabic (MSA) in terms of pronunciation but also in terms of morphology, grammar and lexical choice. This diversity can be troublesome even to a native Arabic speaker let alone a computer. Several NLP tools work well for MSA and in some of the main dialects but fail to cover Arabic language as a whole. Based on our manual evaluation, our model normalizes sentences entirely correctly 46\% of the time and almost correctly 26\% of the time.
Files
Arabic_normalization_1_.pdf
Files
(213.3 kB)
Name | Size | Download all |
---|---|---|
md5:63424c53345dd95915e76f2c861b063c
|
213.3 kB | Preview Download |
Additional details
Identifiers
- ISSN
- 2416-5999
Dates
- Accepted
-
2024