Published August 1, 2021
| Version v2.4.0-dev0
Software
Open
PyThaiNLP/pythainlp: PyThaiNLP v2.4.0-dev0
Creators
- Wannaphong Phatthiyaphaibun1
- Arthit Suriyawongkul2
- Pattarawat Chormai
- Charin3
- Lalita Lowphansirikul
- Pakin Siwatammarat
- smeeklai
- Pete Peeradej Tanruangporn
- Peradon Charoenchainetr4
- Can Udomcharoenchaikit5
- Supaseth
- "Plane" Abhabongse Janthong6
- Korakot Chaovavanich5
- Korkeat W.
- Nonthakon Jitchiranant
- Nutchanon Ninyawee7
- Vee Satayamas8
- boomsquared
- Yann Dubois9
- nyamakawa
- Chanchana Sornsoontorn
- Cody10
- Krissda Prakalphakul
- Preeti Yuankrathok11
- Codacy Badger12
- fossabot13
- hopedataannotations
- Natthapong S.
- pontakornth
- 1. @PyThaiNLP
- 2. Thai Netizen Network
- 3. Somboonkij Building Supply 1992
- 4. KBTG
- 5. VISTEC
- 6. Krungthai Bank PCL
- 7. Codustry Pte. Ltd.
- 8. Kasetsart University
- 9. Vector Institute
- 10. QuantifiedCode
- 11. @C0D1UM
- 12. Codacy
- 13. @fossas
Description
PyThaiNLP v2.4.0-dev0
is The first development release of PyThaiNLP 2.4 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.4 change log #545
NewsDeprecation and other API changesSince PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1 We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
- #550 Deprecated syllable_tokenize.
syllable_tokenize
is deprecated, usesubword_tokenize
instead - https://github.com/PyThaiNLP/pythainlp/commit/701fb3a7842b3abd0b2318ba9074f1902c2f32e9
pythainlp.tag.named_entity.ThaiNameTagger
is change topythainlp.tag.thainer.ThaiNameTagger
. This old class will be deprecated in PyThaiNLP version 2.5.
- #580 Add Thai Text Augmentation
- #557 Fix lots of misspellings in dictionary (words_th.txt)
- #576 Add get_corpus_default_db and thainer 1.5 model. Now, You can add corpus on
default_db.json
and you dont load last thainer model from Internet.
- #599 Add tltk (pos_tag and ner) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #600 Add NER class -
NER
class for Named-entity recognizer tasks.
- #589 Add
pythainlp.translate.Translate
Class - #588 Add Chinese-Thai Machine Translation
- #562 Tokenize repeating dots and commas from numbers
- #585 Fix token_max_len bug that makes it always zero
- #562 Tokenize repeating dots and commas from numbers (fix #461)
- #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- https://github.com/PyThaiNLP/pythainlp/commit/314411086707b60ba8790724301224916f4670b8 Add SEFR CUT to pythainlp
- #599 Add tltk (sentence_tokenize and word_tokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- #585 Manually merge update-royin branch with dev branch to add O-ANG rule
- #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #573 Fix token_max_len bug that makes it always zero
- #583 Add
pythainlp.word_vector.WordVector
- #591 Add more spelling engine
- #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #579 Add pythainlp.generate
- #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Files
PyThaiNLP/pythainlp-v2.4.0-dev0.zip
Files
(11.7 MB)
Name | Size | Download all |
---|---|---|
md5:f6c29b84295b36ab358f1d61360b4393
|
11.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/PyThaiNLP/pythainlp/tree/v2.4.0-dev0 (URL)