Published April 3, 2020 | Version v2
Other Open

MLPerf Inference quantized BERT ONNX Model on SQuAD v1.1 dataset

Description

This model is fine-tuned based on MLPerf Inference BERT PyTorch Model on SQuAD v1.1 dataset and converted to ONNX using the script in MLPerf inference repo: https://github.com/mlperf/inference

The quantization method is: per-tensor, symmetric, zero_point=0. It uses ONNX QuantizeLinear and DequantizeLinear to achieve the quantization. Achieved accuracy is f1_score=90.482%.

The description for fine-tuning step is in "MLPerf INT8 BERT Finetuning.pdf".

Files

MLPerf INT8 BERT Finetuning.pdf

Files (1.3 GB)

Name Size Download all
md5:45f88ffb2915362242703c85c38ec2d4
1.3 GB Download
md5:b07694dbfc82dc268536bb35e79244a1
49.3 kB Preview Download
md5:64800d5d8528ce344256daf115d4965e
231.5 kB Preview Download