Tashaphyne: A Python package for Arabic Light1 Stemming
Description
Tashaphyne: Arabic Light Stemmer تاشفين: التجذيع الخفيف للنصوص العربية
Tashaphyne is an Arabic light stemmer and segmentor. It mainly supports light stemming (removing prefixes and suffixes) and gives all possible segmentations. It uses a modified finite state automaton, which allows it to generate all segmentations.
It offers stemming and root extraction at the same time, unlike the Khoja stemmer, ISRI stemmer, Assem stemmer, and Farasa stemmer.
Tashaphyne comes with default prefixes and suffixes, and accepts the use of customized prefixes and suffixes lists, which allow it to handle more aspects and make customized stemmers without changing code.
Tashaphyne is a python library, it's available as a demo on Mishkal, choose Tools/Analysis and as source code on Github
Files
tashaphyne-0.3.7.zip
Files
(630.4 kB)
Name | Size | Download all |
---|---|---|
md5:5be432f18c60872762d0f1c0ef7944df
|
630.4 kB | Preview Download |
Additional details
Related works
- Documents
- Peer review: 10.1007/s10791-023-09429-y (DOI)
- Is published in
- Peer review: 10.32604/cmc.2021.016155 (DOI)