Published March 1, 2018 | Version v1
Dataset Open

Datasets of "An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition"

  • 1. University of Tirana

Description

This is an Albanian named entities annotation corpus generated automatically (silver-standard) from Wikipedia and WikiData. It is offered in Apache OpenNLP annotation format.

Details of the generation approach may be found in the respective published paper: https://doi.org/10.2478/cait-2018-0009

Attached are also the files that were used for generating the Albanian named entities gazetteer and the gazetteer itself in JSON format.

Files

albanian-ne-gazetteer.zip

Files (19.0 MB)

Name Size Download all
md5:a02e364d6f7be6a71f9b365887b27226
370.1 kB Preview Download
md5:d2df897f047f8227246bee5fa862faad
18.6 MB Preview Download

Additional details

Related works

Is part of
Journal article: 10.2478/cait-2018-0009 (DOI)