There is a newer version of the record available.

Published September 17, 2022 | Version 2.0
Dataset Open

Thai NER 2.0

  • 1. @PyThaiNLP

Description

Thai Named Entity Recognition Corpus

This version was released at Hugging Face Hub, and the model was trained by WangchanBERTa base model.

Dataset

Size

  • Train: 3,938 docs
  • Validation: 1,313 docs
  • Test: 1,313 Docs

Some data come from crowdsourcing between Dec 2018 - Nov 2019. https://github.com/wannaphong/thai-ner

Domain

  • News (It, politics, economy, social)
  • PR (KKU news)
  • general

Source

And more (the lists are lost.)

Tag

  • DATA - date
  • TIME - time
  • EMAIL - email
  • LEN - length
  • LOCATION - Location
  • ORGANIZATION - Company / Organization
  • PERSON - Person name
  • PHONE - phone number
  • TEMPERATURE - temperature
  • URL - URL
  • ZIP - Zip code
  • MONEY - the amount
  • LAW - legislation
  • PERCENT - PERCENT

Download: HuggingFace Hub

Model

The model was trained by WangchanBERTa base model.

Validation from the Validation set

  • Precision: 0.830336794125095
  • Recall: 0.873701039168665
  • F1: 0.8514671513892494
  • Accuracy: 0.9736483416628805

Test from the Test set

  • Precision: 0.8199168093956447
  • Recall: 0.8781446540880503
  • F1: 0.8480323927622422
  • Accuracy: 0.9724346779516247

Download: HuggingFace Hub

Files

build.ipynb

Files (7.8 MB)

Name Size Download all
md5:508de040eeb0951d8e12f32a2c0d2bab
15.8 kB Preview Download
md5:003e5c643811c1ea44b40a9753cf91fa
859.3 kB Download
md5:0871655282a26de7332419c0aee3eefb
3.5 MB Preview Download
md5:98986fb9d5b3be93d1c989b3fa2bfd6d
2.6 MB Download
md5:62d92fedeecad18f19143e576aa0a013
28.6 kB Preview Download
md5:aae571b961c365551aea9a1eee87f7ed
838.0 kB Download

Additional details

Related works