Published January 26, 2024 | Version v0.3.7
Software Open

Tashaphyne: A Python package for Arabic Light1 Stemming

  • 1. ROR icon University of Bouira

Description

Tashaphyne: Arabic Light Stemmer تاشفين: التجذيع الخفيف للنصوص العربية

Tashaphyne is an Arabic light stemmer and segmentor. It mainly supports light stemming (removing prefixes and suffixes) and gives all possible segmentations. It uses a modified finite state automaton, which allows it to generate all segmentations.

It offers stemming and root extraction at the same time, unlike the Khoja stemmer, ISRI stemmer, Assem stemmer, and Farasa stemmer.

Tashaphyne comes with default prefixes and suffixes, and accepts the use of customized prefixes and suffixes lists, which allow it to handle more aspects and make customized stemmers without changing code.

Tashaphyne is a python library, it's available as a demo on  Mishkal, choose Tools/Analysis and as source code on Github 

Files

tashaphyne-0.3.7.zip

Files (630.4 kB)

Name Size Download all
md5:5be432f18c60872762d0f1c0ef7944df
630.4 kB Preview Download

Additional details

Related works

Documents
Peer review: 10.1007/s10791-023-09429-y (DOI)
Is published in
Peer review: 10.32604/cmc.2021.016155 (DOI)