Published July 6, 2017 | Version v1
Dataset Open

The Annotated Corpus of Classical Tibetan (ACTib), Part I - Segmented version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL.

  • 1. Cambridge University
  • 2. SOAS, University of London

Description

This corpus is a part-of-speech tagged version of

Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218

using the training data of

Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878

using the memory based tagger of

https://languagemachines.github.io/mbt/

Please note that the files are not post-processed or manually corrected and that a small number of files in the KarmaDelek directory were still annotated, although the original xml-input was corrupted already.

Files

DharmaDownloadPostSegmented.zip

Files (593.1 MB)

Name Size Download all
md5:9966f43d315443597e359b8333de93d6
39.5 MB Preview Download
md5:97b2210e9dda3fbe4fdf6a922fb774a9
18.3 MB Preview Download
md5:4272fe9d1570d47cbb3689b63005ba2d
33.6 MB Preview Download
md5:f45ddba7d8e6cc7ac35f945a2cb56925
96.9 MB Preview Download
md5:b188c5ed673965d856f7be811666a6be
37.7 MB Preview Download
md5:8a777dc7cb47dcaa620bf512a2f17e13
20.7 MB Preview Download
md5:ca2808c7a6cb4a3bd64dd4340619af97
7.9 MB Preview Download
md5:6769cd34ebfba04fcf3a7144e3f49a8f
29.4 MB Preview Download
md5:8f8fe53f4d2c158b90d5b9a5efd3cf4c
6.6 MB Preview Download
md5:439dea96bd7b6fc1b997e0298e128b4c
289.0 MB Preview Download
md5:af54d4f0f436951d8aabaa68def40938
5.6 MB Preview Download
md5:1037feca6ac4cb104bc84223aa66d95a
7.8 MB Preview Download

Additional details

Related works