Published September 17, 2022
| Version 2.0
Dataset
Open
Thai NER 2.0
Description
Thai Named Entity Recognition Corpus
This version was released at Hugging Face Hub, and the model was trained by WangchanBERTa base model.
Dataset
Size
- Train: 3,938 docs
- Validation: 1,313 docs
- Test: 1,313 Docs
Some data come from crowdsourcing between Dec 2018 - Nov 2019. https://github.com/wannaphong/thai-ner
Domain
- News (It, politics, economy, social)
- PR (KKU news)
- general
Source
- I use sone data from Nutcha’s theses (http://pioneer.chula.ac.th/~awirote/Data-Nutcha.zip) and improve data by rechecking and adding more tagging.
- Blognone.com - It news
- thaigov.go.th
- kku.ac.th
And more (the lists are lost.)
Tag
- DATA - date
- TIME - time
- EMAIL - email
- LEN - length
- LOCATION - Location
- ORGANIZATION - Company / Organization
- PERSON - Person name
- PHONE - phone number
- TEMPERATURE - temperature
- URL - URL
- ZIP - Zip code
- MONEY - the amount
- LAW - legislation
- PERCENT - PERCENT
Download: HuggingFace Hub
Model
The model was trained by WangchanBERTa base model.
Validation from the Validation set
- Precision: 0.830336794125095
- Recall: 0.873701039168665
- F1: 0.8514671513892494
- Accuracy: 0.9736483416628805
Test from the Test set
- Precision: 0.8199168093956447
- Recall: 0.8781446540880503
- F1: 0.8480323927622422
- Accuracy: 0.9724346779516247
Download: HuggingFace Hub
Files
build.ipynb
Files
(7.8 MB)
Name | Size | Download all |
---|---|---|
md5:508de040eeb0951d8e12f32a2c0d2bab
|
15.8 kB | Preview Download |
md5:003e5c643811c1ea44b40a9753cf91fa
|
859.3 kB | Download |
md5:0871655282a26de7332419c0aee3eefb
|
3.5 MB | Preview Download |
md5:98986fb9d5b3be93d1c989b3fa2bfd6d
|
2.6 MB | Download |
md5:62d92fedeecad18f19143e576aa0a013
|
28.6 kB | Preview Download |
md5:aae571b961c365551aea9a1eee87f7ed
|
838.0 kB | Download |
Additional details
Related works
- Is supplement to
- https://github.com/wannaphong/thai-ner/tree/1.5.2 (URL)