Published January 20, 2023
| Version v0.9.1
Software
Open
Simplemma
Creators
Description
What's Changed
- smaller language data footprint with smallest possible impact on performance, using a combination of rules, upper limit on word length, and better data cleaning (#31)
- unsupervised approach to affixes activated by default for some languages
- reviewed rules for English and German (less greedy)
- added rules for Dutch, Finnish, Polish and Russian
- improved Russian and Ukrainian language data (#3)
- improved tokenizer
Full Changelog: https://github.com/adbar/simplemma/compare/v0.9.0...v0.9.1
Notes
Files
adbar/simplemma-v0.9.1.zip
Files
(75.5 MB)
Name | Size | Download all |
---|---|---|
md5:a6ecb5ee5badd1bd79dc296863dcb05b
|
75.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/adbar/simplemma/tree/v0.9.1 (URL)