Published May 24, 2021 | Version v2
Software Open

MLPerf Inference quantized BERT PyTorch model on SQuAD v1.1 dataset

Authors/Creators

  • 1. NVIDIA

Description

This model is finetuned and quantized based on a pretrained huggingface BERT model.

The quantization method is: per-tensor, symmetric, zero_point=0. It uses NVIDIA's quantization toolkit on top of PyTorch to perform quantization. Achieved accuracy is f1_score=90.633%.

A description of the quantization steps can be found in README.md. All code necessary to reproduce can be found in the upload: Dockerfile, run_squad.py, quant_trainer.py, and modeling_bert.patch. The PyTorch model itself is pytorch_model.bin.

Files

README.md

Files (1.3 GB)

Name Size Download all
md5:acd8a4f652d1d4653de33ee130761d0c
1.2 kB Download
md5:af17111f17c622268f864f28544ae99a
8.0 kB Download
md5:0734c580cb53b4b56a3f400771ffcb7c
1.3 GB Download
md5:cb534ea4fdd4c186d9a1d0983179ddc0
9.4 kB Download
md5:53a1fd283ff0e3871bbb524eaf85d3ba
3.7 kB Preview Download
md5:d71ca747e7f0fc077bbb0c295b446b66
38.6 kB Download