Project deliverable Open Access
Ajausks, Ēriks; Mireles-Chaves, Víctor; Sageder, Christian; Lagzdiņš, Andis; Montiel-Ponsoda, Elena
This deliverable summarizes the intermediate work on acquired corpora (as part of WP2) within the context of the Lynx project. The aim of this task is to provide a description of the corpora collection methods, and the resulting collected corpora by Lynx partners around the different use cases. There are three business cases for which corpora are being collected. The first case is related to Compliance Assurance Services for Contracts, the second is related to Compliance Assurance Services in Oil & Gas and Energy, and the third Business Case is about Compliance Assurance Services in Labor Law. This document serves as reference material for the corpora collected to cover the needs of the three business cases, and for the first steps in the method followed to index that corpora. Furthermore, the document describes the corpora preparation workflow to be used in the training of Neural MT engines for specific languages and domains. Finally, this document reports on the term extraction process performed so far on the compiled corpora and briefly outlines its further use in the Lynx MT systems.
D2.3 Intermediate report on Lynx acquired corpora (final).pdf