Published April 4, 2021
| Version v2.3.1
Software
Open
PyThaiNLP/pythainlp: PyThaiNLP v2.3.1 Release!
Creators
- Wannaphong Phatthiyaphaibun1
- Arthit Suriyawongkul2
- Pattarawat Chormai
- Charin3
- Lalita Lowphansirikul
- Pakin Siwatammarat
- smeeklai
- Pete Peeradej Tanruangporn
- Peradon Charoenchainetr4
- Can Udomcharoenchaikit5
- Supaseth
- "Plane" Abhabongse Janthong6
- Korakot Chaovavanich7
- Korkeat W.
- Nonthakon Jitchiranant
- Nutchanon Ninyawee8
- boomsquared
- Yann Dubois9
- nyamakawa
- Chanchana Sornsoontorn
- Cody10
- Krissda Prakalphakul
- Preeti Yuankrathok11
- Codacy Badger12
- fossabot13
- hopedataannotations
- Natthapong S.
- pontakornth
- 1. @PyThaiNLP
- 2. Thai Netizen Network
- 3. Somboonkij Building Supply 1992
- 4. KBTG
- 5. Chulalongkorn University
- 6. Krungthai Bank PCL
- 7. VISTEC
- 8. Codustry Pte. Ltd.
- 9. Facebook AI Research
- 10. QuantifiedCode
- 11. @C0D1UM
- 12. Codacy
- 13. @fossas
Description
PyThaiNLP v2.3.1
is This release is a bug fix release of PyThaiNLP 2.3.
Bug Fixed
- Fix gensim #546
Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
- #526 Update ThaiNER 1.4 to ThaiNER 1.5
- #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
- #485 Fixed Romanize failed in some examples
- #511 Add Thai W2P (Thai Word-to-Phoneme converter)
- #523 Add mT5 text summarize to
pythainlp.summarize
- #524 Add
pythainlp.tag.chunk
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Thanks all the contributors. (Image made with contributors-img) <a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors"> <img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" /> </a>
We build Thai NLP.
PyThaiNLP
Files
PyThaiNLP/pythainlp-v2.3.1.zip
Files
(11.2 MB)
Name | Size | Download all |
---|---|---|
md5:2ede237fcb2b9980c09f2042d6dd1a21
|
11.2 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/PyThaiNLP/pythainlp/tree/v2.3.1 (URL)