Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

doi:10.5281/zenodo.5920815

Published January 30, 2022 | Version v1

Report Open

Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

1. Technische Universität Wien

Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

TU Wien Experiment Design For Data Science Assignment 2

Group 43

András Bonifác Kónya (ID:01502933)

Branimir Raguž (ID:12123474)

Thummanoon Kunanuntakij (ID:12122522)

Abstract

We attempt to reproduce the result of BERT for Evidence Retrieval and Claim Verification [1]. The original paper use BERT for the task of evidence-based claim verification using FEVER dataset 50K Wikipedia pages [2] and it achieves a new state of the art recall of 87.1 for retrieving evidence sentences the dataset, and scores second in the leaderboard with the FEVER score of 69.7. We discuss their experiment design, metric used and attempt to reproduce their result. By reviewing the process describe by the original paper, we conclude that their experiment design is questionable, and the result might not be able to generalize well. Although we are not able to confirm the number due to various difficulties encountered for recreating the dataset and the time frame limitation, we document the list of problem and our effort we have done to resolve them in the process.

Files

Reproducibility of the Experimental Result of BERT for Evidence Retrieval.pdf

Files (267.9 kB)

Name	Size	Download all
Reproducibility of the Experimental Result of BERT for Evidence Retrieval.pdf md5:9b50a46e1bbca5a407e4adb6b8eb2f6d	267.9 kB	Preview Download

Additional details

Soleimani A., Monz C., Worring M.: BERT for Evidence Retrieval and Claim Verification. In: Jose J. et al. (eds) Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12036. Springer, Cham. https://doi.org/10.1007/978-3-030-45442-5_45
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018)
Hanselowski, A., et al.: UKP-Athene: multi-sentence textual entailment for claim verification. In: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pp. 103–108. Association for Computational Linguistics, Brussels, November 2018. https://doi.org/10.18653/v1/W18-5516
Soleimani A.: ASoleimaniB/BERT_FEVER, Github.com, https://github.com/ASoleimaniB/BERT_FEVER/tree/d630e7150554c72319b37729f0522b462b63603c (2020)

	All versions	This version
Views	66	65
Downloads	58	57
Data volume	17.1 MB	16.9 MB

Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

Creators

Description

Files

Reproducibility of the Experimental Result of BERT for Evidence Retrieval.pdf

Files (267.9 kB)

Additional details

References