Classical Tibetan corpus annotated for verb-argument dependency relations

doi:10.5281/zenodo.4727108

Published April 29, 2021 | Version v1.0

Dataset Open

Classical Tibetan corpus annotated for verb-argument dependency relations

This is a small hand-annotated partial treebank of Tibetan, primarily in CoNLL-U format. It builds upon the following corpus:

Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878

This corpus differs from the above in three ways:

The tagset has been converted from the SOAS tag system to the Universal Dependency part-of-speech tagset.
We have added dependency relations between verbs and their argument.
For some of the texts, English translations were available in digital form. These translations were manually aligned to the Tibetan texts and included in the CoNLL-U files.

It was created as part of the AHRC-funded project Lexicography in Motion (PI Ulrich Pagel, 2017-2021).

Notes

Funded by the UK's Arts and Humanities Research Council (grant code: AH/P004644/1)

Files

tibetan-nlp/classical-tibetan-corpus-v1.0.zip

Files (20.7 MB)

Name	Size	Download all
tibetan-nlp/classical-tibetan-corpus-v1.0.zip md5:b75a4417e629c3e53540b776604c5ba8	20.7 MB	Preview Download

Additional details

Is supplement to: https://github.com/tibetan-nlp/classical-tibetan-corpus/tree/v1.0 (URL)

	All versions	This version
Views	338	338
Downloads	23	23
Data volume	516.7 MB	516.7 MB

Classical Tibetan corpus annotated for verb-argument dependency relations

Creators

Description

Notes

Files

tibetan-nlp/classical-tibetan-corpus-v1.0.zip

Files (20.7 MB)

Additional details

Related works