Published May 28, 2021
| Version 1.0.0
Dataset
Open
Hachidaishu part of speech dataset
Description
Hachidaishu part-of-speech dataset
This dataset contains the part-of-speech information of the Imperial Anthology of Japanese Poetry and the Hachidaishu.
Data offset
Example: #1 Kokinshu
10001 年/名/とし の/格助/の 内/名/うち に/格助/に 春/名/はる は/係助/は き/カ変-用:来:く/き に/完-用:ぬ:ぬ/に けり/過-終:けり:けり/けり 一とせ/名/ひととせ を/*助/を こそ/名/こぞ と/格助/と や/係助/や いは/ハ四-未:言ふ:いふ/いは ん/推-終体:む:む/む ことし/名/ことし と/格助/と や/係助/や いは/ハ四-未:言ふ:いふ/いは ん/推-終体:む:む/む
A line a poem: tokens are separated by spaces; and a token consists of pos elements separated by slashes.
- 1st column "10001" contains two elements: the first digit is an anthology ID and the rest is a poem ID; the anthology ID: 1..Kokinshu, 2..Gosenshu, 3..Shuishu, 4..Goshuishu, 5..Kin'yoshu, 6..Shikashu, 7..Senzaishu, and 8..Shinkokinshu.
- The poem ID is the same as in the database "Nijuichidaishu."
- 2nd column and the followings are the information of each token.
- In case of noun and particle, such as tokens not having conjugations: text/POS/reading.
- In case of verb, adjectives, such as tokens having conjugations: text/POS:lemma-kanji:lemma-reading/reading.
Files
hachidaishu-pos.txt
Files
(4.2 MB)
Name | Size | Download all |
---|---|---|
md5:585dc98f23348b331f68f58bf63440b2
|
4.2 MB | Preview Download |
Additional details
References
- Hilofumi Yamamoto. POS tagger for Classical Japanese Poems, The Study of Japanese Linguistics, The Society of Japanese Linguistics, Vol. 3, No. 3, pp. 33-39, July 2007.
- Hilofumi Yamamoto. Thesaurus for the Hachidaishu (ca. 905-1205) with the classification codes based on semantic principles, The Study of Japanese Linguistics, The Society of Japanese Linguistics, Vol. 5, No. 1, pp. 46-52, Jan. 2009.