There is a newer version of the record available.

Published May 19, 2021 | Version v3
Dataset Open

Romanian Named Entity Recognition in the Legal domain (LegalNERo)

  • 1. Research Institute for Artificial Intelligence "Mihai Drăgănescu", Romanian Academy

Description

LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. 
It provides gold annotations for organizations, locations, persons, time and legal resources mentioned in legal documents.
Additionally it offers GEONAMES codes for the named entities annotated as location (where a link could be established). 

The LegalNERo corpus is available in different formats: span-based, token-based and RDF. 
The Linguistic Linked Open Data (LLOD) version is provided in RDF-Turtle format.

CONLLUP files conform to the CoNLL-U Plus format https://universaldependencies.org/ext-format.html .
Part-of-speech tagging was realized using UDPIPE. 
Named entity annotations are placed in the column "RELATE:NE" (the 11th column) as defined in the "global.columns" metadata field.
Similarly GEONAMES references are in the column "RELATE:GEONAMES" (the 12th column, last).
Automatic processing was performed through the RELATE platform (https://relate.racai.ro).

ANN files conform to BRAT format (https://brat.nlplab.org/).
 
The archive contains: 

- ann_LEGAL_PER_LOC_ORG_TIME_overlap 
    Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. 
    Overlapping annotations of organizations and time entities inside legal references were allowed. 

- ann_LEGAL_PER_LOC_ORG_TIME 
    Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. 
    Overlapping annotations were not allowed and only the longest named entities were annotated. 

- ann_PER_LOC_ORG_TIME 
    Folder in which all the files are in .ann format and contains annotations of: persons, locations, organizations and time. 
    There are no overlapping annotations. 

- conllup_LEGAL_PER_LOC_ORG_TIME 
    Folder in which all the files are in .conllup format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. 
    Overlapping annotations were not allowed and only the longest named entities were annotated. 
    The annotation of these files was enhanced with GEONAMES codes (where linking was possible).  

- conllup_PER_LOC_ORG_TIME 
    Folder in which all the files are in .conllup format and contains annotations of: persons, locations, organizations and time. 
    Overlapping annotations were not allowed and only the longest named entities were annotated. 
    The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

- rdf 
    Folder containing the corpus in RDF-Turtle format.
    All the annotations are available here in both span and token format.

- text 
    Folder containing the raw texts.

 

NER System

A NER model generated using the LegalNERo corpus can be used online in the RELATE platform: https://relate.racai.ro/index.php?path=ner/demo

This system was described in: Păiș, Vasile and Mitrofan, Maria and Gasan, Carol Luca and Coneschi, Vlad and Ianov, Alexandru. Named Entity Recognition in the Romanian Legal Domain. In Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 9--18, nov 2021


LICENSING

This work is provided under the license CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives 4.0 International).
The license can be viewed online here: https://creativecommons.org/licenses/by-nc-nd/4.0/ 
and the full text here: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode . 


CONTACT

Research Institute for Artificial Intelligence "Mihai Draganescu", Romanian Academy
Web: http://www.racai.ro 
Contact emails: vasile@racai.ro , maria@racai.ro

Files

legalnero.zip

Files (21.5 MB)

Name Size Download all
md5:e1d23833576dc63306740c5f98dea924
21.5 MB Preview Download