The Annotated Corpus of Classical Tibetan (ACTib), Part II - POS-tagged version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL
- 1. Cambridge
- 2. SOAS, University of London
Description
This corpus is a part-of-speech tagged version of
Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218
using the training data of
Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878
Please note that the files are not post-processed or manually corrected and that a small number of files in the KarmaDelek directory were still annotated, although the original xml-input was corrupted already.
using the memory based tagger of
https://languagemachines.github.io/mbt/
Files
DharmaDownloadtagged.zip
Files
(783.0 MB)
Name | Size | Download all |
---|---|---|
md5:58a258e4a26bf117516d5108daeb4960
|
52.0 MB | Preview Download |
md5:b2b9ae591a4079023e94d527feae2d49
|
24.1 MB | Preview Download |
md5:b5b3a88d16cf913a0339123e0b268e71
|
44.5 MB | Preview Download |
md5:698849f34dc129b09bfb1457f71710a4
|
127.7 MB | Preview Download |
md5:c871e6999c032981800cffdc8f78ec7d
|
49.9 MB | Preview Download |
md5:a6176dfd396d2f252d7e7f4cb3ed34fc
|
27.2 MB | Preview Download |
md5:f2ed8ad329776390634effecc74b9ed7
|
10.4 MB | Preview Download |
md5:0b008246e23176484544dddf2f5c5816
|
38.7 MB | Preview Download |
md5:12d9e8aa916c7ec599241465f2e5ebdf
|
8.6 MB | Preview Download |
md5:6354fb1e24527b877869c503d8608a68
|
382.2 MB | Preview Download |
md5:41d8c6e4be99e02fc7b70da8b891e4f4
|
7.4 MB | Preview Download |
md5:dbc04c7cb1282ed982291a585a300d4a
|
10.2 MB | Preview Download |
Additional details
Related works
- Cites
- 10.5281/zenodo.821218 (DOI)
- 10.5281/zenodo.574878 (DOI)