MLPerf Inference quantized BERT ONNX Model on SQuAD v1.1 dataset

Huang, Po-Han; Forster, Christopher; Sequeira, Dilip; Wu, Hao; Judd, Patrick

doi:10.5281/zenodo.3750364

Published April 3, 2020 | Version v2

Other Open

MLPerf Inference quantized BERT ONNX Model on SQuAD v1.1 dataset

1. NVIDIA

This model is fine-tuned based on MLPerf Inference BERT PyTorch Model on SQuAD v1.1 dataset and converted to ONNX using the script in MLPerf inference repo: https://github.com/mlperf/inference

The quantization method is: per-tensor, symmetric, zero_point=0. It uses ONNX QuantizeLinear and DequantizeLinear to achieve the quantization. Achieved accuracy is f1_score=90.482%.

The description for fine-tuning step is in "MLPerf INT8 BERT Finetuning.pdf".

Files

MLPerf INT8 BERT Finetuning.pdf

Files (1.3 GB)

Name	Size	Download all
bert_large_v1_1_fake_quant.onnx md5:45f88ffb2915362242703c85c38ec2d4	1.3 GB	Download
MLPerf INT8 BERT Finetuning.pdf md5:b07694dbfc82dc268536bb35e79244a1	49.3 kB	Preview Download
vocab.txt md5:64800d5d8528ce344256daf115d4965e	231.5 kB	Preview Download

Views

17K

Downloads

Show more details

	All versions	This version
Views	3,131	2,787
Downloads	17,376	17,178
Data volume	23.1 TB	22.9 TB

More info on how stats are collected....

DOI

Resource type

Other

Publisher

Zenodo

License: Apache License 2.0

A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: April 13, 2020
Modified: July 22, 2024

MLPerf Inference quantized BERT ONNX Model on SQuAD v1.1 dataset

Creators

Description

Files

MLPerf INT8 BERT Finetuning.pdf

Files (1.3 GB)