Hachidaishu part of speech dataset

doi:10.5281/zenodo.4835806

Published May 28, 2021 | Version 1.0.0

Dataset Open

Hachidaishu part of speech dataset

1. Tokyo Institute of Technology
2. Osaka University

Hachidaishu part-of-speech dataset

This dataset contains the part-of-speech information of the Imperial Anthology of Japanese Poetry and the Hachidaishu.

Data offset

Example: #1 Kokinshu

10001 年/名/としの/格助/の内/名/うちに/格助/に春/名/はるは/係助/はき/カ変-用:来:く/きに/完-用:ぬ:ぬ/にけり/過-終:けり:けり/けり一とせ/名/ひととせを/*助/をこそ/名/こぞと/格助/とや/係助/やいは/ハ四-未:言ふ:いふ/いはん/推-終体:む:む/むことし/名/ことしと/格助/とや/係助/やいは/ハ四-未:言ふ:いふ/いはん/推-終体:む:む/む

A line a poem: tokens are separated by spaces; and a token consists of pos elements separated by slashes.

1st column "10001" contains two elements: the first digit is an anthology ID and the rest is a poem ID; the anthology ID: 1..Kokinshu, 2..Gosenshu, 3..Shuishu, 4..Goshuishu, 5..Kin'yoshu, 6..Shikashu, 7..Senzaishu, and 8..Shinkokinshu.
The poem ID is the same as in the database "Nijuichidaishu."
2nd column and the followings are the information of each token.
In case of noun and particle, such as tokens not having conjugations: text/POS/reading.
In case of verb, adjectives, such as tokens having conjugations: text/POS:lemma-kanji:lemma-reading/reading.

Files

hachidaishu-pos.txt

Files (4.2 MB)

Name	Size	Download all
hachidaishu-pos.txt md5:585dc98f23348b331f68f58bf63440b2	4.2 MB	Preview Download

Additional details

Hilofumi Yamamoto. POS tagger for Classical Japanese Poems, The Study of Japanese Linguistics, The Society of Japanese Linguistics, Vol. 3, No. 3, pp. 33-39, July 2007.
Hilofumi Yamamoto. Thesaurus for the Hachidaishu (ca. 905-1205) with the classification codes based on semantic principles, The Study of Japanese Linguistics, The Society of Japanese Linguistics, Vol. 5, No. 1, pp. 46-52, Jan. 2009.

	All versions	This version
Views	1,375	1,372
Downloads	52	52
Data volume	233.0 MB	233.0 MB

Hachidaishu part of speech dataset

Creators

Description

Files

hachidaishu-pos.txt

Files (4.2 MB)

Additional details

References