Published January 30, 2022 | Version v1
Report Open

Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

  • 1. Technische Universität Wien


Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

TU Wien Experiment Design For Data Science Assignment 2

Group 43

András Bonifác Kónya (ID:01502933)

Branimir Raguž (ID:12123474)

Thummanoon Kunanuntakij (ID:12122522)


We attempt to reproduce the result of BERT for Evidence Retrieval and Claim Verification [1]. The original paper use BERT for the task of evidence-based claim verification using FEVER dataset 50K Wikipedia pages [2] and it achieves a new state of the art recall of 87.1 for retrieving evidence sentences the dataset, and scores second in the leaderboard with the FEVER score of 69.7. We discuss their experiment design, metric used and attempt to reproduce their result. By reviewing the process describe by the original paper, we conclude that their experiment design is questionable, and the result might not be able to generalize well. Although we are not able to confirm the number due to various difficulties encountered for recreating the dataset and the time frame limitation, we document the list of problem and our effort we have done to resolve them in the process.


Reproducibility of the Experimental Result of BERT for Evidence Retrieval.pdf

Additional details
