Published January 24, 2024
| Version v1.0
Dataset
Open
Han Instruct Dataset
Creators
Description
Dataset Summary
🪿 Han (ห่าน or goose) Instruct Dataset is a Thai instruction dataset by PyThaiNLP. It collect the instruction following in Thai from many source.
Many question are collect from Reference desk at Thai wikipedia.
Data sources:
- Reference desk at Thai wikipedia.
- Law from justicechannel.org
- pythainlp/final_training_set_v1_enth: Human checked and edited.
- Self-instruct from WangChanGLM
- Wannaphong.com
- Human annotators
Supported Tasks and Leaderboards
- ChatBot
- Instruction Following
Languages
Thai
Dataset Structure
Data Fields
- inputs: Question
- targets: Answer
Considerations for Using the Data
The dataset can be biased by human annotators. You should check the dataset to select or remove an instruction before training the model or using it at your risk.
Licensing Information
CC-BY-SA 4.0
Files
han-instruct-dataset-v1.0.csv
Files
(1.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4d27c03f5b114692c9a01c2b522b4c53
|
1.5 MB | Preview Download |