Published May 9, 2025
| Version v0.12.0
Software
Open
Xilinx/brevitas: Release v0.12.0
Authors/Creators
- Alessandro Pappalardo1
- Giuseppe Franco2
- nickfraser
- Ian Colbert
- Fabian Grob
- Timothy Costigan
- Oscar Savolainen3
- Andrei Stoian4
- jinchen2
- alexredd99
- Anton Gerdelan5
- 无聊的小黑
- vfdev6
- derpda
- Yaman Umuroglu7
- Tim Paine8
- Kenneth Witham9
- Saad Khan10
- Pablo Monteagudo Lago
- Omar Peracha11
- MichalMachura
- Luis Montero12
- Javier Duarte13
- Felix Jentzsch14
- 1. @Apple
- 2. AMD
- 3. DeepRender
- 4. Zama.ai
- 5. @AMD
- 6. @Quansight
- 7. @AMD Research Labs
- 8. @Point72
- 9. KRI @ Northeastern University
- 10. National University of Science and Technology
- 11. FlowZed
- 12. @zama-ai
- 13. UC San Diego
- 14. Paderborn University
Description
Breaking Changes
- TruncIntQuant, TruncAvgPool, Trunc QONNX Op changes #1042
Highlights
- New PTQ algorithms:
- AWQ #1213
- MagR #1214
- QuaRot #1061
- SpinQuant #1155
- AutoRound #1064
- SVDQuant #1210
- New datatype support
- Hierarchical scales #1038
- Initial
torch.compilesupport #1206- User guide here
- YAML-based experiments #1116
- Benchmarking scripts for LLM example #1166
- New operator support
- Better SDPA quantization support #1090
What's Changed
- Feat (examples/generative): block-based optimization for GPTQ by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1046
- Fix (learned_round): disable return QuantTensor during float inference by @pablomlago in https://github.com/Xilinx/brevitas/pull/1059
- Bump onnx from 1.15 to 1.17.0 in /requirements by @dependabot in https://github.com/Xilinx/brevitas/pull/1069
- Fix (minifloat): correct minifloat computation and tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1067
- Feat (ptq): adding accumulator-aware extensions to GPxQ by @i-colbert in https://github.com/Xilinx/brevitas/pull/1060
- Feat: add contributing guidelines by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1075
- Feat (float): adding new attributes to proxy and quant tensor by @i-colbert in https://github.com/Xilinx/brevitas/pull/1072
- Feat (accelerate): improved accelerate compatibility by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1065
- Fix Transformers tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1081
- Fix (data): updating wikitext2 data utility by @i-colbert in https://github.com/Xilinx/brevitas/pull/1080
- Fix (groupwise): correct log, groupdim, and scale computation by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1071
- Test (mx): add reference impl for MXFloat by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1068
- Fix (examples/generative): Fixed argument order for
quantize_modelby @nickfraser in https://github.com/Xilinx/brevitas/pull/1084 - Feat (export): qonnx minifloat export by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1070
- Feat (core): use runtime parameter for scale by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1037
- Fix (per_group): fixing the per_group sym quantizer by @i-colbert in https://github.com/Xilinx/brevitas/pull/1089
- Rotation based equalization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1061
- Fix (examples/llm): fix for main and README by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1092
- Fix: correct output scale compute by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1077
- Fix (ptq/rotation): fix for rotation implementation (#1095) by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1095
- Fix (scaling)!: clamp to avoid inf/nan in forward/backward by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1097
- Setup: bump python & torch version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1098
- Feat: Per-Row po2 float ocp by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1102
- Fix LLM tests by @pablomlago in https://github.com/Xilinx/brevitas/pull/1088
- Feat (brevitas_examples/llm): remove dependencies from optimum-amd by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1094
- Feat auto round by @pablomlago in https://github.com/Xilinx/brevitas/pull/1064
- Fix (hadamard): remove hadamard loading warning by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1108
- Hierarchical scales by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1038
- Improvements to learned round by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1107
- Feat (brevitas_examples/llm): update README by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1109
- Fix (gpxq): tensor unpacking and Cholesky stabilization by @i-colbert in https://github.com/Xilinx/brevitas/pull/1111
- Feat (llm): adding more quantizers by @i-colbert in https://github.com/Xilinx/brevitas/pull/1113
- Feat (llm/learned_round): fast block update by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1110
- Fix SignSGD docstring by @pablomlago in https://github.com/Xilinx/brevitas/pull/1115
- Feat (nn/sdpa): quantization of scaled dot-product attention by @nickfraser in https://github.com/Xilinx/brevitas/pull/1090
- Fix (brevitas_examples/llm): scaling_min_val for fp32 by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1117
- Feat (scaling): no tracked_parameter_list with individual quantizer by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1112
- Feat (brevitas_examples/llm): select act_eq alpha by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1121
- Fix llm tests transformers by @pablomlago in https://github.com/Xilinx/brevitas/pull/1118
- Fix (float/clamp): Bugfix when unsigned by @nickfraser in https://github.com/Xilinx/brevitas/pull/1132
- Feat (brevitas_examples/llm): inference_mode support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1129
- Feat (brevitas_examples/llm): correct scale init with CPU offloading by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1124
- Feat (brevitas_examples/sdxl): inference_mode + compile by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1133
- Feat (proxy): flag to enable/disable QT return by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1083
- Feat (examples/llm): Specify experiments via YAML files by @nickfraser in https://github.com/Xilinx/brevitas/pull/1116
- test (core/float): Enhanced testing of minifloat formats by @nickfraser in https://github.com/Xilinx/brevitas/pull/1136
- Eval harness by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1131
- Fix: pytree warning by @i-colbert in https://github.com/Xilinx/brevitas/pull/1144
- Fix LLM entry point by @i-colbert in https://github.com/Xilinx/brevitas/pull/1145
- Fix (scaling/standalone): better switch from runtime stats to param by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1099
- Fix (proxy): fix groupwise scale/zp caching by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1137
- Fix (export/inference_mode): correct rounding function by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1146
- Setup: pin transformers version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1150
- Feat (mx): unpadding during dequantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1134
- Feat (brevitas_examples/llm): load from checkpoint by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1151
- Feat (rotation): equalize across SDPA by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1149
- Feat (quantization): torch_function based quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1147
- Setup: bump torch version for LLM tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1154
- Feat (equalize): enable parametrized rotations by @pablomlago in https://github.com/Xilinx/brevitas/pull/1148
- Feat (optim): add Cailey SGD optimizer by @pablomlago in https://github.com/Xilinx/brevitas/pull/1153
- Setup: update pre-commit python version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1158
- Fix (brevitas_examples/llm): remove unecessary checkpointing by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1161
- Feat (zero_point): dynamic groupwise zero point by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1160
- New rotation by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1159
- Fix (brevitas_examples/llm): equalized module + fx compatibility by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1164
- Fix (runtime_act): fix negative group_dim handling by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1157
- Fix (a2q): missing restrict_pre_scaling_impl definition by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1167
- Feat (equalize): enable rotation matrix optimization by @pablomlago in https://github.com/Xilinx/brevitas/pull/1155
- Add FP16 support to ptq_evaluate.py and update README argument list by @hkayann in https://github.com/Xilinx/brevitas/pull/1174
- Feat (brevitas_examples/llm): separate KV Cache quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1165
- Feat (hadamard): support region expansion by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1178
- Feat (llm): benchmark for llm entrypoint by @pablomlago in https://github.com/Xilinx/brevitas/pull/1166
- fix (docs/faq): remove reference to gitter, switch affine quantization to be an example by @nickfraser in https://github.com/Xilinx/brevitas/pull/1183
- Fix (brevitas_examples/sdxl): correct import for inference_mode by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1185
- Feat (gpfq): optimizing with lower diagonal matrix formulation by @i-colbert in https://github.com/Xilinx/brevitas/pull/1172
- Feat (brevitas_examples/llm): better dtype selection by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1186
- Fix (brevitas_examples/sdxl): faster sdxl inference by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1188
- fix (examples/benchmark): Fix when
run_results.yamldoes not exist. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1189 - Feat (example/common): Added groupwise, float scaled OCP option by @nickfraser in https://github.com/Xilinx/brevitas/pull/1190
- Fix (examples/llm): default dtype from None to float16 by @pablomlago in https://github.com/Xilinx/brevitas/pull/1191
- Fix (utils/torch_utils): ensure gradient propagation through pad_to_dim by @pablomlago in https://github.com/Xilinx/brevitas/pull/1194
- Fix (examples/llm): prevent layernorm_to_rmsnorm option when fused_no_fx by @pablomlago in https://github.com/Xilinx/brevitas/pull/1192
- Feat (brevitas_examples/sdxl): update mlperf by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1195
- Feat (brevitas_examples/llm): support for lighteval by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1162
- Fix (optim/cailey_sgd): fix cailey sgd in float16/bfloat16 by @pablomlago in https://github.com/Xilinx/brevitas/pull/1193
- Feat (brevitas_examples/stable_diffusion): VAE quantization support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1197
- Fix (quant_tensors): remove duplication by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1204
- Fix (brevitas_examples/llm): support MSE with offloaded models by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1196
- Fix (quant): improvements to quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1207
- Fix (export/inference_mode): correct handler for dynamic float quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1208
- Feat: Initial SVDQuant support by @nickfraser in https://github.com/Xilinx/brevitas/pull/1210
- Feat (equalize): enable parametrized scales by @pablomlago in https://github.com/Xilinx/brevitas/pull/1175
- Fix (llm/equalize): remove call to _update_weights by @pablomlago in https://github.com/Xilinx/brevitas/pull/1216
- Local compile support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1206
- Setup: pin onnxruntime by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1218
- Fix (quant): clean-up to quantization code by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1219
- Fix (equalize): dtype fix in activation equalization by @pablomlago in https://github.com/Xilinx/brevitas/pull/1217
- Feat (example/benchmark): Added script to convert YAML cfgs to "benchmark" configs by @nickfraser in https://github.com/Xilinx/brevitas/pull/1184
- Support for transformer-based diffusion network by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1211
- Fix (brevitas_examples/llm): remove deprecated flag by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1225
- Fix (ex/llm): add missing copyright header by @nickfraser in https://github.com/Xilinx/brevitas/pull/1227
- Feat (compile): limit activation recompiles by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1222
- Fix (ex/llm): Added defaults for several arguments. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1238
- Feat (compile): limit memory utilization with groupwise quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1232
- Feat (brevitas_examples/diffusion): flux attention quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1221
- Feat (brevitas_examples/llm): BOS preprocessing for calibration data by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1240
- Fix (test/ste_ops): fix mock tests by @nickfraser in https://github.com/Xilinx/brevitas/pull/1242
- Fix (calibrate): correct zero_point init by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1243
- Feat (examples/generative): add fnuz quantizers by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1244
- Docs (readme): Update citation by @nickfraser in https://github.com/Xilinx/brevitas/pull/1247
- Feat (ex/benchmark): Add optional start/end indices by @nickfraser in https://github.com/Xilinx/brevitas/pull/1248
- Fix (ex/llm): Regenerate template configs by @nickfraser in https://github.com/Xilinx/brevitas/pull/1249
- Fix (gptq): Fix several edge cases by @nickfraser in https://github.com/Xilinx/brevitas/pull/1252
- Fix (brevitas_examples/diffusion): workaround for svdquant with SDXL by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1256
- Setup: fix pre_commit CI by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1264
- Feat (magr): initial implementation of MagR by @i-colbert in https://github.com/Xilinx/brevitas/pull/1214
- Fix/Feat (trunc avg pool): Update truncation and average pool behaviour by @nickfraser in https://github.com/Xilinx/brevitas/pull/1042
- Fix (ex/llm): Fix per-row quant_sdpa broadcastable shape by @nickfraser in https://github.com/Xilinx/brevitas/pull/1254
- feat (ex/benchmark): Added option to shuffle order of benchmark processes by @nickfraser in https://github.com/Xilinx/brevitas/pull/1268
- Fix (examples/llm): Fix PPLs by @pablomlago in https://github.com/Xilinx/brevitas/pull/1271
- Fix (data): bos_processing in pile dataset by @i-colbert in https://github.com/Xilinx/brevitas/pull/1259
- Feat (llm/eval): remove BOS token by @pablomlago in https://github.com/Xilinx/brevitas/pull/1258
- Fix (graph/hadamard):
.viewcan fail with functional QuantSDPA by @nickfraser in https://github.com/Xilinx/brevitas/pull/1270 - Fix (scaling/float): correct dtype for threshold by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1265
- Fix (runtime_quant): correct priority for act quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1255
- Fix (quant_sdpa): remove print by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1273
- Feat (graph/calibrate): refactor DisableEnableQuantization by @pablomlago in https://github.com/Xilinx/brevitas/pull/1257
- Fix (quant/float): input_view_impl for float_no_scale by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1260
- Fix (ci): Don't update PyTorch version by @nickfraser in https://github.com/Xilinx/brevitas/pull/1275
- Feat (brevitas_examples/sdxl): better GPTQ by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1250
- Feat (ex/llm): bos preprocessing by @pablomlago in https://github.com/Xilinx/brevitas/pull/1277
- test (ex/llm): Minor fixes to tests. Add rotation tests. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1253
- Fix (graph/equalize): fix value-output region in SDPA by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1278
- Feat (graph/calibrate): change quant_status_manager defaults to no-op by @pablomlago in https://github.com/Xilinx/brevitas/pull/1274
- Fix (core/function): Fix learned round when padding is applied to weights by @nickfraser in https://github.com/Xilinx/brevitas/pull/1235
- Fix (export/onnx): Improved ONNX export performance by @nickfraser in https://github.com/Xilinx/brevitas/pull/1279
- Feat (llm/awq): activation-aware weight scaling by @pablomlago in https://github.com/Xilinx/brevitas/pull/1213
- Docs: update / generate docs for 0.12.0 release by @nickfraser in https://github.com/Xilinx/brevitas/pull/1284
- Docs: regen notebooks and docs by @nickfraser in https://github.com/Xilinx/brevitas/pull/1285
New Contributors
- @dependabot made their first contribution in https://github.com/Xilinx/brevitas/pull/1069
- @hkayann made their first contribution in https://github.com/Xilinx/brevitas/pull/1174
Full Changelog: https://github.com/Xilinx/brevitas/compare/v0.11.0...v0.12.0
Files
Xilinx/brevitas-v0.12.0.zip
Files
(3.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:36c6c023ca02736f01e32653472b39af
|
3.6 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/Xilinx/brevitas/tree/v0.12.0 (URL)
Software
- Repository URL
- https://github.com/Xilinx/brevitas