Published May 14, 2021 | Version 2.0.0
Dataset Open

AnCora Catalan 2.0.0

  • 1. Universitat de Barcelona


AnCora Catalan 2.0.0 consists of 500,000 words. The corpus is annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora Catalan 2.0.0 is mainly based on journalist texts. For more information, click AnCora-corpus.

The annotators of AnCora Catalan 2.0.0 are:

Oriol Borrega, Isabel Briz, Núria Bufí, Montserrat Civit, María Jesús Díaz, Silvia Garcia, Raquel Hernández, Marina Lloberes, Raquel Marcos, Difda Monterde, Montserrat Nofre, Aina Peris, Lourdes Puiggròs, Marta Recasens, Bàrbara Soriano, Rita Zaragoza.


AnCora Catalan

Files (11.7 MB)

Name Size Download all
11.7 MB Preview Download