Software Open Access

PyThaiNLP/pythainlp: PyThaiNLP v2.3.1 Release!

Wannaphong Phatthiyaphaibun; Arthit Suriyawongkul; Pattarawat Chormai; Charin; Lalita Lowphansirikul; Pakin Siwatammarat; smeeklai; Pete Peeradej Tanruangporn; Peradon Charoenchainetr; Can Udomcharoenchaikit; Supaseth; "Plane" Abhabongse Janthong; Korakot Chaovavanich; Korkeat W.; Nonthakon Jitchiranant; Nutchanon Ninyawee; boomsquared; Yann Dubois; nyamakawa; Chanchana Sornsoontorn; Cody; Krissda Prakalphakul; Preeti Yuankrathok; Codacy Badger; fossabot; hopedataannotations; Natthapong S.; pontakornth

PyThaiNLP v2.3.1 is This release is a bug fix release of PyThaiNLP 2.3.

Bug Fixed

  • Fix gensim #546

Documentation: Report bug:

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes Tokenizer
  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine
  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation
Name Entity Tagging
  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
  • #523 Add mT5 text summarize to pythainlp.summarize
Chunk parser
  • #524 Add pythainlp.tag.chunk
  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Thanks all the contributors. (Image made with contributors-img) <a href=""> <img src="" /> </a>

We build Thai NLP.


Files (11.2 MB)
Name Size
11.2 MB Download
All versions This version
Views 79771
Downloads 293
Data volume 342.8 MB33.6 MB
Unique views 61657
Unique downloads 293


Cite as