The Annotated Corpus of Classical Tibetan (ACTib), Part I - Segmented version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL.
- 1. Cambridge University
- 2. SOAS, University of London
Description
This corpus is a part-of-speech tagged version of
Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218
using the training data of
Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878
using the memory based tagger of
https://languagemachines.github.io/mbt/
Please note that the files are not post-processed or manually corrected and that a small number of files in the KarmaDelek directory were still annotated, although the original xml-input was corrupted already.
Files
DharmaDownloadPostSegmented.zip
Files
(593.1 MB)
Name | Size | Download all |
---|---|---|
md5:9966f43d315443597e359b8333de93d6
|
39.5 MB | Preview Download |
md5:97b2210e9dda3fbe4fdf6a922fb774a9
|
18.3 MB | Preview Download |
md5:4272fe9d1570d47cbb3689b63005ba2d
|
33.6 MB | Preview Download |
md5:f45ddba7d8e6cc7ac35f945a2cb56925
|
96.9 MB | Preview Download |
md5:b188c5ed673965d856f7be811666a6be
|
37.7 MB | Preview Download |
md5:8a777dc7cb47dcaa620bf512a2f17e13
|
20.7 MB | Preview Download |
md5:ca2808c7a6cb4a3bd64dd4340619af97
|
7.9 MB | Preview Download |
md5:6769cd34ebfba04fcf3a7144e3f49a8f
|
29.4 MB | Preview Download |
md5:8f8fe53f4d2c158b90d5b9a5efd3cf4c
|
6.6 MB | Preview Download |
md5:439dea96bd7b6fc1b997e0298e128b4c
|
289.0 MB | Preview Download |
md5:af54d4f0f436951d8aabaa68def40938
|
5.6 MB | Preview Download |
md5:1037feca6ac4cb104bc84223aa66d95a
|
7.8 MB | Preview Download |
Additional details
Related works
- Cites
- 10.5281/zenodo.821218 (DOI)
- 10.5281/zenodo.574878 (DOI)