Published November 18, 2019 | Version v1
Journal article Open

Replication, analysis and adaptation of a term alignment approach

  • 1. Jožef Stefan Institute
  • 2. Jožef Stefan Institute, Usher Institute of Population Health Sciences and Informatics, Edinburgh Medical School, Edinburgh, UK

Description

In this paper, we look at the issue of reproducibility and replicability in bilingual terminology alignment (BTA). We propose a set of best practices for reproducibility and replicability of NLP papers and analyze several influential BTA papers from this perspective. Next, we present our attempts at replication and reproduction, where we focus on a bilingual terminology alignment approach described by Aker et al. (Extracting bilingual terminologies from comparable corpora. In: Proceedings of the 51st annual meeting of the association for computational linguistics, vol. 1 402–411, 2013) who treat bilingual term alignment as a binary classification problem and train an SVM classifier on various dictionary and cognate-based features. Despite closely following the original paper with only minor deviations—in areas where the original description is not clear enough—we obtained significantly worse results than the authors of the original paper. We then analyze the reasons for the discrepancy and describe our attempts at adaptation of the approach to improve the results. Only after several adaptations, we achieve results which are close to the results published in the original paper. Finally, we perform the experiments to verify the replicability and reproducibility of our own code. We publish our code and datasets online to assure the reproducibility of the results of our experiments and implement the selected BTA models in an online platform making them easily reusable even by the technically less-skilled researchers.

Files

Repar2019_Article_ReproductionReplicationAnalysi.pdf

Files (499.7 kB)

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission