Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published April 4, 2021 | Version v2.3.1
Software Open

PyThaiNLP/pythainlp: PyThaiNLP v2.3.1 Release!

Description

PyThaiNLP v2.3.1 is This release is a bug fix release of PyThaiNLP 2.3.

Bug Fixed

  • Fix gensim #546

Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes Tokenizer
  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine
Corpus
  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation
Name Entity Tagging
  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
Transliterate
  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
  • #523 Add mT5 text summarize to pythainlp.summarize
Chunk parser
  • #524 Add pythainlp.tag.chunk
Util
  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Thanks all the contributors. (Image made with contributors-img) <a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors"> <img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" /> </a>

We build Thai NLP.

PyThaiNLP

Files

PyThaiNLP/pythainlp-v2.3.1.zip

Files (11.2 MB)

Name Size Download all
md5:2ede237fcb2b9980c09f2042d6dd1a21
11.2 MB Preview Download

Additional details

Related works