Published April 3, 2020
| Version v2
Other
Open
MLPerf Inference quantized BERT ONNX Model on SQuAD v1.1 dataset
Description
This model is fine-tuned based on MLPerf Inference BERT PyTorch Model on SQuAD v1.1 dataset and converted to ONNX using the script in MLPerf inference repo: https://github.com/mlperf/inference
The quantization method is: per-tensor, symmetric, zero_point=0. It uses ONNX QuantizeLinear and DequantizeLinear to achieve the quantization. Achieved accuracy is f1_score=90.482%.
The description for fine-tuning step is in "MLPerf INT8 BERT Finetuning.pdf".