Linguatec Tolosa Treebank for Occitan This archive contains the first dependency treebank for Occitan, developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds. The current version of the treebank contains 13K tokens annotated for PoS tags, lemmas and syntactic dependencies. Linguistic annotation follows Universal Dependencies guidelines (https://universaldependencies.org/#language-u). A detailed corpus description is provided in the file Linguatec_Tolosa_Treebank_Description.ods. A subset of texts was doubly annotated and these annotations were adjudicated in order to provide the final annotation. These texts are therefore the most suited to be used as test files in NLP experiments. The corpus files are stored in the ConLL-U format. Each sentence is preceded by a sentence ID and the original, non-tokenized text of the sentence. The annotation is provided in a column-based format defined as follows: 1. ID: Word index, integer starting at 1 for each new sentence; may be a range for multiword tokens. 2. FORM: Word form or punctuation symbol. 3. LEMMA: Lemma or stem of word form. 4. UPOS: Universal part-of-speech tag. 5. XPOS: Language-specific part-of-speech tag; underscore if not available. 6. FEATS: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available. 7. HEAD: Head of the current word, which is either a value of ID or zero (0). 8. DEPREL: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one. 9. DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs. 10. MISC: Any other annotation. We do not use morphological features or enhanced dependency graphs; therefore columns 6 and 9 systematically contain an underscore. The MISC column may contain a gloss in French, named entity annotation or multiword expression annotation, but this is not the case in all files. The texts are distributed under the Creative Commons BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en).