Published April 29, 2021 | Version v1.0
Dataset Open

Classical Tibetan corpus annotated for verb-argument dependency relations

Description

This is a small hand-annotated partial treebank of Tibetan, primarily in CoNLL-U format. It builds upon the following corpus:

Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878

This corpus differs from the above in three ways:

  1. The tagset has been converted from the SOAS tag system to the Universal Dependency part-of-speech tagset.
  2. We have added dependency relations between verbs and their argument.
  3. For some of the texts, English translations were available in digital form. These translations were manually aligned to the Tibetan texts and included in the CoNLL-U files.

It was created as part of the AHRC-funded project Lexicography in Motion (PI Ulrich Pagel, 2017-2021).

Notes

Funded by the UK's Arts and Humanities Research Council (grant code: AH/P004644/1)

Files

tibetan-nlp/classical-tibetan-corpus-v1.0.zip

Files (20.7 MB)

Name Size Download all
md5:b75a4417e629c3e53540b776604c5ba8
20.7 MB Preview Download

Additional details