Published April 29, 2021
| Version v1.0
Dataset
Open
Classical Tibetan corpus annotated for verb-argument dependency relations
Description
This is a small hand-annotated partial treebank of Tibetan, primarily in CoNLL-U format. It builds upon the following corpus:
Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878
This corpus differs from the above in three ways:
- The tagset has been converted from the SOAS tag system to the Universal Dependency part-of-speech tagset.
- We have added dependency relations between verbs and their argument.
- For some of the texts, English translations were available in digital form. These translations were manually aligned to the Tibetan texts and included in the CoNLL-U files.
It was created as part of the AHRC-funded project Lexicography in Motion (PI Ulrich Pagel, 2017-2021).
Notes
Files
tibetan-nlp/classical-tibetan-corpus-v1.0.zip
Files
(20.7 MB)
Name | Size | Download all |
---|---|---|
md5:b75a4417e629c3e53540b776604c5ba8
|
20.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/tibetan-nlp/classical-tibetan-corpus/tree/v1.0 (URL)