There is a newer version of the record available.

Published February 10, 2021 | Version v1
Dataset Open

NER4conllu (Named Entites from Ancora Corpus)

Description

Named Entites from Ancora Corpus

Since multiwords (including Named Entites) in the original Ancora corpus are aggregated as a single lexical item using underscores (e.g. "Ajuntament_de_Barcelona") we splitted them to align with word-per-line .conllu format, and added conventional Begin-Inside-Outside (IOB) tags to mark and classify Named Entites.

Entitats nombrades del corpus Ancora

Com que moltes multiparaules del corpus Ancora original es presenten com una sola paraula amb guions baixos (per exemple, "Ajuntament_de_Barcelona"), les hem dividides de manera que es puguin alinear amb el format conllu d'una paraula per línia. Per marcar-les i classificar-les hem fet servir les etiquetes de la convenció Begin-Inside-Outside (IOB).

Files

ner4conllu.zip

Files (92.6 kB)

Name Size Download all
md5:cce8ce884c544c67eba68c4efb506813
92.6 kB Preview Download