The Annotated Corpus of Classical Tibetan (ACTib) - Version 2.0 (Segmented & POS-tagged)
Description
This corpus consisting of >185 million tokens is a segmented and part-of-speech tagged version of
Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218
using the training data of
Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878
The code for segmenting and POS tagging any Tibetan file can be found on GitHub.
This Version 2 of ACTib is based on the same XML files as ACTib Version 1 (http://doi.org/10.5281/zenodo.823707), but contains both segmented and POS-tagged files and is improved in a number of ways, although post-processing was still done automatically and no manual correction was involved. For details of this improved annotation method see:
Meelen, Marieke, Roux, Élie & Hill, Nathan (forthcoming). 'Optimisation of the largest annotated Tibetan corpus combining rule-based, memory-based & deep-learning methods' in TALLIP.
Notes
Files
results-Dharmadownload.zip
Files
(838.8 MB)
Name | Size | Download all |
---|---|---|
md5:19e80e13def1224b76769eeabc594ce0
|
90.4 MB | Preview Download |
md5:1a93124f0e57d2765df1bbca4bc766a9
|
42.1 MB | Preview Download |
md5:6dc2eaa0d904f983dcbe8a0f079768a3
|
78.6 MB | Preview Download |
md5:207dd6d020efebeb2df727dbaceb615b
|
192.6 MB | Preview Download |
md5:a33a58ec60b9fbc4f4e420a8a314efc5
|
222.2 MB | Preview Download |
md5:d8ac07e85670b6354205afdd2cc599a6
|
84.2 MB | Preview Download |
md5:4419bf8785c1553dbda608033de8c1e3
|
17.5 MB | Preview Download |
md5:a0f94d337feefe478f744ad5eb176e55
|
65.6 MB | Preview Download |
md5:12ee3b32e598362aa11606b96520a441
|
15.2 MB | Preview Download |
md5:50ac6dae81ed84c99e5365f2f045719b
|
12.8 MB | Preview Download |
md5:702e386dbc72f1b9eca6f7d4344e8185
|
17.5 MB | Preview Download |