Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published March 12, 2020 | Version v1
Working paper Open

Evaluating the Impact of Bilingual Lexical Resources on Cross-lingual Sentiment Projection in the Pharmaceutical Domain

  • 1. Semalytix GmbH, Bielefeld, Germany

Description

Rolling out text analytics applications or individual components thereof to multiple input languages of interest requires scalable workflows and architectures that do not rely on manual annotation efforts or language-specific re-engineering per target language. These scalability challenges aggravate even further if specialized technical domains are targeted in multiple languages. In recent work, it has been shown that cross-lingual projection of sentiment models in deep learning frameworks based on bilingual sentiment embeddings (BLSE) is feasible without any annotated data in the target language, capitalizing on monolingual embeddings and a bilingual translation dictionary only (Barnes et al., 2018). We use their framework and apply it to multilingual text analytics problems in the pharmaceutical domain in order to (i) investigate under which conditions the BLSE approach scales to technical domains as well, and (ii) assess the impact of different configurations of underlying lexical resources. For the language pair English/Spanish, our findings corroborate the strength of cross-lingual projection approaches such as BLSE in technical scenarios, given the availability of bilingual resources that provide broad lexical coverage, on the one hand, and complementary domain- and task-specific knowledge, on the other.

Files

draft.pdf

Files (397.9 kB)

Name Size Download all
md5:70b1866b2409207288fc8ac994e9614f
397.9 kB Preview Download

Additional details

Funding

Pret-a-LLOD – Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors 825182
European Commission