Published March 2024
| Version 2.1
Dataset
Open
Thai NER 2.1
Description
This version is fixed wrong tag (DATA -> DATE) from Thai NER 2.0.
Dataset
Size
- Train: 3,938 docs
- Validation: 1,313 docs
- Test: 1,313 Docs
Some data come from crowdsourcing between Dec 2018 - Nov 2019. https://github.com/wannaphong/thai-ner
Domain
- News (It, politics, economy, social)
- PR (KKU news)
- general
Source
- I use sone data from Nutcha’s theses (http://pioneer.chula.ac.th/~awirote/Data-Nutcha.zip) and improve data by rechecking and adding more tagging.
- Blognone.com - It news
- thaigov.go.th
- kku.ac.th
And more (the lists are lost.)
Tag
- DATE - date
- TIME - time
- EMAIL - email
- LEN - length
- LOCATION - Location
- ORGANIZATION - Company / Organization
- PERSON - Person name
- PHONE - phone number
- TEMPERATURE - temperature
- URL - URL
- ZIP - Zip code
- MONEY - the amount
- LAW - legislation
- PERCENT - PERCENT
Files
Files
(4.3 MB)
Name | Size | Download all |
---|---|---|
md5:003e5c643811c1ea44b40a9753cf91fa
|
859.3 kB | Download |
md5:98986fb9d5b3be93d1c989b3fa2bfd6d
|
2.6 MB | Download |
md5:92d4fb1155edd40e1d0cc4d839a61f5e
|
838.0 kB | Download |
Additional details
Related works
- Is supplement to
- https://github.com/wannaphong/thai-ner/tree/1.5.2 (URL)