Published June 28, 2021 | Version minor corrections on lemmas and tokenization
Dataset Open

Enriched CONLLU Ancora for ML training

Description

This is an enriched version for Machine Learning purposes of the CONLLU adaptation of AnCora corpus .

This version of the corpus was developed by BSC TeMU as part of the AINA project, and has been used to do multi-task learning for the Catalan language Spacy 3.4 models.

Versió enriquida de l'adaptació del corpus AnCora al format CONLLU orientada a l'aprenentatge automàtic.

Aquesta versió del corpus ha estat desenvolupada per BSC TeMU com a part del projecte Aina, i s'ha fet servir per a l'entrenament multitasca dels models Spacy 3.0 per al català.

 

Notes

supercedes previous version

Files

ANCORA_ca_2022.zip

Files (11.6 MB)

Name Size Download all
md5:ea78258d289faf0f6a940c06291b3b50
5.7 MB Preview Download
md5:70d9820fff00313f853d5e4d5fbf87f7
5.9 MB Preview Download