Published May 24, 2021
| Version v2
Software
Open
MLPerf Inference quantized BERT PyTorch model on SQuAD v1.1 dataset
Description
This model is finetuned and quantized based on a pretrained huggingface BERT model.
The quantization method is: per-tensor, symmetric, zero_point=0. It uses NVIDIA's quantization toolkit on top of PyTorch to perform quantization. Achieved accuracy is f1_score=90.633%.
A description of the quantization steps can be found in README.md. All code necessary to reproduce can be found in the upload: Dockerfile, run_squad.py, quant_trainer.py, and modeling_bert.patch. The PyTorch model itself is pytorch_model.bin.
Files
README.md
Files
(1.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:acd8a4f652d1d4653de33ee130761d0c
|
1.2 kB | Download |
|
md5:af17111f17c622268f864f28544ae99a
|
8.0 kB | Download |
|
md5:0734c580cb53b4b56a3f400771ffcb7c
|
1.3 GB | Download |
|
md5:cb534ea4fdd4c186d9a1d0983179ddc0
|
9.4 kB | Download |
|
md5:53a1fd283ff0e3871bbb524eaf85d3ba
|
3.7 kB | Preview Download |
|
md5:d71ca747e7f0fc077bbb0c295b446b66
|
38.6 kB | Download |