Published October 26, 2021 | Version v1
Conference paper Open

More Data and New Tools. Advances in Parsing the Index Thomisticus Treebank

  • 1. IUSS, Pavia
  • 2. Università Cattolica del Sacro Cuore, Milan, Italy

Description

This paper investigates the recent advances in parsing the Index Thomisticus Treebank, which encompasses Medieval Latin texts by Thomas Aquinas. The research focuses on two types of variables. On the one hand, it examines the impact that a larger dataset has on the results of parsing; on the other hand, performances of new parsers are analysed with respect to less recent tools. Term of comparison to determine the effective parsing advances are the results in parsing the Index Thomisticus Treebank described in a previous work. First, the best performing parser among those concerned in that study is tested on a larger dataset than the one originally used. Then, some parser combinations that were developed in the same study are evaluated as well, assessing that more training data result in more accurate performances. Finally, to examine the impact that newly available tools have on parsing results, we train, test, and evaluate two neural parsers chosen among those best performing in the CoNLL 2018 Shared Task. Our experiments reach the highest accuracy rates achieved so far in automatic syntactic parsing of the Index Thomisticus Treebank and of Latin overall.

Files

2021_Gamba-Passarotti-Ruffolo_CHR_ParsingITTB.pdf

Files (350.5 kB)

Name Size Download all
md5:537f315186e56e4cd8cad5bf55d3b8fd
350.5 kB Preview Download

Additional details

Funding

LiLa – Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin 769994
European Commission