Published January 20, 2023 | Version v0.9.1
Software Open

Simplemma

Description

What's Changed

  • smaller language data footprint with smallest possible impact on performance, using a combination of rules, upper limit on word length, and better data cleaning (#31)
  • unsupervised approach to affixes activated by default for some languages
  • reviewed rules for English and German (less greedy)
  • added rules for Dutch, Finnish, Polish and Russian
  • improved Russian and Ukrainian language data (#3)
  • improved tokenizer

Full Changelog: https://github.com/adbar/simplemma/compare/v0.9.0...v0.9.1

Notes

If you use this software, please cite it using these metadata.

Files

adbar/simplemma-v0.9.1.zip

Files (75.5 MB)

Name Size Download all
md5:a6ecb5ee5badd1bd79dc296863dcb05b
75.5 MB Preview Download

Additional details

Related works