Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published January 30, 2022 | Version v1
Report Open

Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

  • 1. Technische Universität Wien

Description

Reproducibility of the Experimental Result of BERT for Evidence Retrieval and Claim Verification

TU Wien Experiment Design For Data Science Assignment 2

Group 43

András Bonifác Kónya (ID:01502933)

Branimir Raguž (ID:12123474)

Thummanoon Kunanuntakij (ID:12122522)

Abstract

We attempt to reproduce the result of BERT for Evidence Retrieval and Claim Verification [1]. The original paper use BERT for the task of evidence-based claim verification using FEVER dataset 50K Wikipedia pages [2] and it achieves a new state of the art recall of 87.1 for retrieving evidence sentences the dataset, and scores second in the leaderboard with the FEVER score of 69.7. We discuss their experiment design, metric used and attempt to reproduce their result. By reviewing the process describe by the original paper, we conclude that their experiment design is questionable, and the result might not be able to generalize well. Although we are not able to confirm the number due to various difficulties encountered for recreating the dataset and the time frame limitation, we document the list of problem and our effort we have done to resolve them in the process.

Files

Reproducibility of the Experimental Result of BERT for Evidence Retrieval.pdf

Additional details

References