Published June 1, 2020 | Version v1
Conference paper Open

The Optimization of Portuguese Named-Entity Recognition and Classification by Combining Local Grammars and Conditional Random Fields Trained with a Parsed Corpus

  • 1. Faculty of Humanities and Social Sciences, University of Zagreb

Description

This article presents the results of a study concerning named-entity recognition and classification for Portuguese focusing on temporal expressions. We have used the Conditional Random Fields (CRF) probabilistic method and features coming from an automatically annotated parsed corpus and local grammars. We were able to notice that Part-of-Speech (PoS) tags are the most relevant information coming from a parsed corpus to be used as a feature for this task. No positive synergy emerges from the association of these tags with other linguistic information from the parsed corpus. A NooJ local grammar, created to recognize “Time” category entities (without detailing types and subtypes), provides information that surpasses PoS tags as a feature for CRF training in terms of precision and recall. The combination of PoS and NooJ annotations does not bring any advantage.

Files

NooJ_article_Alves_Bekavac_Tadic.pdf

Files (161.0 kB)

Name Size Download all
md5:c943850de309b9cc4432659d54b2c49e
161.0 kB Preview Download

Additional details

Funding

Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997
European Commission