Published May 10, 2026 | Version Version 2.0
Annotation collection Open

Classical Tibetan Annotation Manual Part II - Segmentation & POS tagging - Version 2

  • 1. ROR icon University of Cambridge
  • 2. ROR icon Trinity College Dublin
  • 3. SOAS, University of London
  • 4. ROR icon École Pratique des Hautes Études

Description

This is the second version of the annotation manual prepared by members of the AHRC-funded 'Emergence of Egophoricity' project. It is essentially a working document meant to support annotators in correcting both automatic word and sentence segmentation as well as morphosyntactic (Part-of-Speech) information. It is not supposed to be read from cover-to-cover. Rather, it is meant as a document that makes it easy to search for examples of cases where annotators are unsure about which decision to take. It uses the original POS tag set from Garrett et al 2014, but is significantly changed and updated in several places. It has detailed examples of all conventions for segmentation and POS tagging, as well as brief explanations of why certain decisions were made. As such, it can also be used as a source of examples for grammatical constructions.

Since additional authors helped to create this second version, we created a separate deposite. Apart from a general thorough check of all tags and extension, this second version includes:

- added numbered* examples for all tags and additional ones in ambiguous cases
- retired obsolete tags and replaced them with alternatives (e.g. n.rel)
- added more special verb tags
- updated and extended sentence segmentation rules
- added selected references

 

The old version can still be found here:

Faggionato, C., Meelen, M., & Hill, N. (2023). Classical Tibetan Annotation Manual Part II - Segmentation & POS tagging (1.1). Zenodo. https://doi.org/10.5281/zenodo.7880130

Further explanation can be found in the following articles:

Marieke Meelen, Élie Roux, and Nathan Hill (2022) Optimisation of the largest annotated Tibetan corpus combining rule-based, memory-based, and deep-learning methods in ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) , 20 , pp. 1–11. https://dl.acm.org/doi/abs/10.1145/3409488 
 
Marieke Meelen and Nathan Hill (2017) Segmenting and POS tagging Classical Tibetan using a memory-based tagger
in Himalayan Linguistics , 16 , pp. 64-89. https://doi.org/10.5070/H916234501

Files

Faggionatoetal2026_TibAnnotationManual-PartII-SegPOS_May2026.pdf

Files (1.0 MB)

Additional details

Funding

Arts and Humanities Research Council
Emergence of Egophoricity AH/V011235/1