There is a newer version of the record available.

Published May 9, 2025 | Version v0.12.0

Xilinx/brevitas: Release v0.12.0

  • 1. @Apple
  • 2. AMD
  • 3. DeepRender
  • 4. Zama.ai
  • 5. @AMD
  • 6. @Quansight
  • 7. @AMD Research Labs
  • 8. @Point72
  • 9. KRI @ Northeastern University
  • 10. National University of Science and Technology
  • 11. FlowZed
  • 12. @zama-ai
  • 13. UC San Diego
  • 14. Paderborn University

Description

Breaking Changes

  • TruncIntQuant, TruncAvgPool, Trunc QONNX Op changes #1042

Highlights

  • New PTQ algorithms:
    • AWQ #1213
    • MagR #1214
    • QuaRot #1061
    • SpinQuant #1155
    • AutoRound #1064
    • SVDQuant #1210
  • New datatype support
    • Hierarchical scales #1038
  • Initial torch.compile support #1206
  • YAML-based experiments #1116
  • Benchmarking scripts for LLM example #1166
  • New operator support
    • Better SDPA quantization support #1090

What's Changed

  • Feat (examples/generative): block-based optimization for GPTQ by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1046
  • Fix (learned_round): disable return QuantTensor during float inference by @pablomlago in https://github.com/Xilinx/brevitas/pull/1059
  • Bump onnx from 1.15 to 1.17.0 in /requirements by @dependabot in https://github.com/Xilinx/brevitas/pull/1069
  • Fix (minifloat): correct minifloat computation and tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1067
  • Feat (ptq): adding accumulator-aware extensions to GPxQ by @i-colbert in https://github.com/Xilinx/brevitas/pull/1060
  • Feat: add contributing guidelines by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1075
  • Feat (float): adding new attributes to proxy and quant tensor by @i-colbert in https://github.com/Xilinx/brevitas/pull/1072
  • Feat (accelerate): improved accelerate compatibility by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1065
  • Fix Transformers tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1081
  • Fix (data): updating wikitext2 data utility by @i-colbert in https://github.com/Xilinx/brevitas/pull/1080
  • Fix (groupwise): correct log, groupdim, and scale computation by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1071
  • Test (mx): add reference impl for MXFloat by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1068
  • Fix (examples/generative): Fixed argument order for quantize_model by @nickfraser in https://github.com/Xilinx/brevitas/pull/1084
  • Feat (export): qonnx minifloat export by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1070
  • Feat (core): use runtime parameter for scale by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1037
  • Fix (per_group): fixing the per_group sym quantizer by @i-colbert in https://github.com/Xilinx/brevitas/pull/1089
  • Rotation based equalization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1061
  • Fix (examples/llm): fix for main and README by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1092
  • Fix: correct output scale compute by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1077
  • Fix (ptq/rotation): fix for rotation implementation (#1095) by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1095
  • Fix (scaling)!: clamp to avoid inf/nan in forward/backward by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1097
  • Setup: bump python & torch version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1098
  • Feat: Per-Row po2 float ocp by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1102
  • Fix LLM tests by @pablomlago in https://github.com/Xilinx/brevitas/pull/1088
  • Feat (brevitas_examples/llm): remove dependencies from optimum-amd by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1094
  • Feat auto round by @pablomlago in https://github.com/Xilinx/brevitas/pull/1064
  • Fix (hadamard): remove hadamard loading warning by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1108
  • Hierarchical scales by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1038
  • Improvements to learned round by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1107
  • Feat (brevitas_examples/llm): update README by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1109
  • Fix (gpxq): tensor unpacking and Cholesky stabilization by @i-colbert in https://github.com/Xilinx/brevitas/pull/1111
  • Feat (llm): adding more quantizers by @i-colbert in https://github.com/Xilinx/brevitas/pull/1113
  • Feat (llm/learned_round): fast block update by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1110
  • Fix SignSGD docstring by @pablomlago in https://github.com/Xilinx/brevitas/pull/1115
  • Feat (nn/sdpa): quantization of scaled dot-product attention by @nickfraser in https://github.com/Xilinx/brevitas/pull/1090
  • Fix (brevitas_examples/llm): scaling_min_val for fp32 by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1117
  • Feat (scaling): no tracked_parameter_list with individual quantizer by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1112
  • Feat (brevitas_examples/llm): select act_eq alpha by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1121
  • Fix llm tests transformers by @pablomlago in https://github.com/Xilinx/brevitas/pull/1118
  • Fix (float/clamp): Bugfix when unsigned by @nickfraser in https://github.com/Xilinx/brevitas/pull/1132
  • Feat (brevitas_examples/llm): inference_mode support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1129
  • Feat (brevitas_examples/llm): correct scale init with CPU offloading by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1124
  • Feat (brevitas_examples/sdxl): inference_mode + compile by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1133
  • Feat (proxy): flag to enable/disable QT return by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1083
  • Feat (examples/llm): Specify experiments via YAML files by @nickfraser in https://github.com/Xilinx/brevitas/pull/1116
  • test (core/float): Enhanced testing of minifloat formats by @nickfraser in https://github.com/Xilinx/brevitas/pull/1136
  • Eval harness by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1131
  • Fix: pytree warning by @i-colbert in https://github.com/Xilinx/brevitas/pull/1144
  • Fix LLM entry point by @i-colbert in https://github.com/Xilinx/brevitas/pull/1145
  • Fix (scaling/standalone): better switch from runtime stats to param by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1099
  • Fix (proxy): fix groupwise scale/zp caching by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1137
  • Fix (export/inference_mode): correct rounding function by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1146
  • Setup: pin transformers version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1150
  • Feat (mx): unpadding during dequantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1134
  • Feat (brevitas_examples/llm): load from checkpoint by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1151
  • Feat (rotation): equalize across SDPA by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1149
  • Feat (quantization): torch_function based quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1147
  • Setup: bump torch version for LLM tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1154
  • Feat (equalize): enable parametrized rotations by @pablomlago in https://github.com/Xilinx/brevitas/pull/1148
  • Feat (optim): add Cailey SGD optimizer by @pablomlago in https://github.com/Xilinx/brevitas/pull/1153
  • Setup: update pre-commit python version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1158
  • Fix (brevitas_examples/llm): remove unecessary checkpointing by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1161
  • Feat (zero_point): dynamic groupwise zero point by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1160
  • New rotation by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1159
  • Fix (brevitas_examples/llm): equalized module + fx compatibility by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1164
  • Fix (runtime_act): fix negative group_dim handling by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1157
  • Fix (a2q): missing restrict_pre_scaling_impl definition by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1167
  • Feat (equalize): enable rotation matrix optimization by @pablomlago in https://github.com/Xilinx/brevitas/pull/1155
  • Add FP16 support to ptq_evaluate.py and update README argument list by @hkayann in https://github.com/Xilinx/brevitas/pull/1174
  • Feat (brevitas_examples/llm): separate KV Cache quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1165
  • Feat (hadamard): support region expansion by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1178
  • Feat (llm): benchmark for llm entrypoint by @pablomlago in https://github.com/Xilinx/brevitas/pull/1166
  • fix (docs/faq): remove reference to gitter, switch affine quantization to be an example by @nickfraser in https://github.com/Xilinx/brevitas/pull/1183
  • Fix (brevitas_examples/sdxl): correct import for inference_mode by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1185
  • Feat (gpfq): optimizing with lower diagonal matrix formulation by @i-colbert in https://github.com/Xilinx/brevitas/pull/1172
  • Feat (brevitas_examples/llm): better dtype selection by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1186
  • Fix (brevitas_examples/sdxl): faster sdxl inference by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1188
  • fix (examples/benchmark): Fix when run_results.yaml does not exist. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1189
  • Feat (example/common): Added groupwise, float scaled OCP option by @nickfraser in https://github.com/Xilinx/brevitas/pull/1190
  • Fix (examples/llm): default dtype from None to float16 by @pablomlago in https://github.com/Xilinx/brevitas/pull/1191
  • Fix (utils/torch_utils): ensure gradient propagation through pad_to_dim by @pablomlago in https://github.com/Xilinx/brevitas/pull/1194
  • Fix (examples/llm): prevent layernorm_to_rmsnorm option when fused_no_fx by @pablomlago in https://github.com/Xilinx/brevitas/pull/1192
  • Feat (brevitas_examples/sdxl): update mlperf by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1195
  • Feat (brevitas_examples/llm): support for lighteval by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1162
  • Fix (optim/cailey_sgd): fix cailey sgd in float16/bfloat16 by @pablomlago in https://github.com/Xilinx/brevitas/pull/1193
  • Feat (brevitas_examples/stable_diffusion): VAE quantization support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1197
  • Fix (quant_tensors): remove duplication by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1204
  • Fix (brevitas_examples/llm): support MSE with offloaded models by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1196
  • Fix (quant): improvements to quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1207
  • Fix (export/inference_mode): correct handler for dynamic float quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1208
  • Feat: Initial SVDQuant support by @nickfraser in https://github.com/Xilinx/brevitas/pull/1210
  • Feat (equalize): enable parametrized scales by @pablomlago in https://github.com/Xilinx/brevitas/pull/1175
  • Fix (llm/equalize): remove call to _update_weights by @pablomlago in https://github.com/Xilinx/brevitas/pull/1216
  • Local compile support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1206
  • Setup: pin onnxruntime by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1218
  • Fix (quant): clean-up to quantization code by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1219
  • Fix (equalize): dtype fix in activation equalization by @pablomlago in https://github.com/Xilinx/brevitas/pull/1217
  • Feat (example/benchmark): Added script to convert YAML cfgs to "benchmark" configs by @nickfraser in https://github.com/Xilinx/brevitas/pull/1184
  • Support for transformer-based diffusion network by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1211
  • Fix (brevitas_examples/llm): remove deprecated flag by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1225
  • Fix (ex/llm): add missing copyright header by @nickfraser in https://github.com/Xilinx/brevitas/pull/1227
  • Feat (compile): limit activation recompiles by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1222
  • Fix (ex/llm): Added defaults for several arguments. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1238
  • Feat (compile): limit memory utilization with groupwise quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1232
  • Feat (brevitas_examples/diffusion): flux attention quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1221
  • Feat (brevitas_examples/llm): BOS preprocessing for calibration data by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1240
  • Fix (test/ste_ops): fix mock tests by @nickfraser in https://github.com/Xilinx/brevitas/pull/1242
  • Fix (calibrate): correct zero_point init by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1243
  • Feat (examples/generative): add fnuz quantizers by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1244
  • Docs (readme): Update citation by @nickfraser in https://github.com/Xilinx/brevitas/pull/1247
  • Feat (ex/benchmark): Add optional start/end indices by @nickfraser in https://github.com/Xilinx/brevitas/pull/1248
  • Fix (ex/llm): Regenerate template configs by @nickfraser in https://github.com/Xilinx/brevitas/pull/1249
  • Fix (gptq): Fix several edge cases by @nickfraser in https://github.com/Xilinx/brevitas/pull/1252
  • Fix (brevitas_examples/diffusion): workaround for svdquant with SDXL by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1256
  • Setup: fix pre_commit CI by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1264
  • Feat (magr): initial implementation of MagR by @i-colbert in https://github.com/Xilinx/brevitas/pull/1214
  • Fix/Feat (trunc avg pool): Update truncation and average pool behaviour by @nickfraser in https://github.com/Xilinx/brevitas/pull/1042
  • Fix (ex/llm): Fix per-row quant_sdpa broadcastable shape by @nickfraser in https://github.com/Xilinx/brevitas/pull/1254
  • feat (ex/benchmark): Added option to shuffle order of benchmark processes by @nickfraser in https://github.com/Xilinx/brevitas/pull/1268
  • Fix (examples/llm): Fix PPLs by @pablomlago in https://github.com/Xilinx/brevitas/pull/1271
  • Fix (data): bos_processing in pile dataset by @i-colbert in https://github.com/Xilinx/brevitas/pull/1259
  • Feat (llm/eval): remove BOS token by @pablomlago in https://github.com/Xilinx/brevitas/pull/1258
  • Fix (graph/hadamard): .view can fail with functional QuantSDPA by @nickfraser in https://github.com/Xilinx/brevitas/pull/1270
  • Fix (scaling/float): correct dtype for threshold by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1265
  • Fix (runtime_quant): correct priority for act quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1255
  • Fix (quant_sdpa): remove print by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1273
  • Feat (graph/calibrate): refactor DisableEnableQuantization by @pablomlago in https://github.com/Xilinx/brevitas/pull/1257
  • Fix (quant/float): input_view_impl for float_no_scale by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1260
  • Fix (ci): Don't update PyTorch version by @nickfraser in https://github.com/Xilinx/brevitas/pull/1275
  • Feat (brevitas_examples/sdxl): better GPTQ by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1250
  • Feat (ex/llm): bos preprocessing by @pablomlago in https://github.com/Xilinx/brevitas/pull/1277
  • test (ex/llm): Minor fixes to tests. Add rotation tests. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1253
  • Fix (graph/equalize): fix value-output region in SDPA by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1278
  • Feat (graph/calibrate): change quant_status_manager defaults to no-op by @pablomlago in https://github.com/Xilinx/brevitas/pull/1274
  • Fix (core/function): Fix learned round when padding is applied to weights by @nickfraser in https://github.com/Xilinx/brevitas/pull/1235
  • Fix (export/onnx): Improved ONNX export performance by @nickfraser in https://github.com/Xilinx/brevitas/pull/1279
  • Feat (llm/awq): activation-aware weight scaling by @pablomlago in https://github.com/Xilinx/brevitas/pull/1213
  • Docs: update / generate docs for 0.12.0 release by @nickfraser in https://github.com/Xilinx/brevitas/pull/1284
  • Docs: regen notebooks and docs by @nickfraser in https://github.com/Xilinx/brevitas/pull/1285

New Contributors

  • @dependabot made their first contribution in https://github.com/Xilinx/brevitas/pull/1069
  • @hkayann made their first contribution in https://github.com/Xilinx/brevitas/pull/1174

Full Changelog: https://github.com/Xilinx/brevitas/compare/v0.11.0...v0.12.0

Files

Xilinx/brevitas-v0.12.0.zip

Files (3.6 MB)

Name Size Download all
md5:36c6c023ca02736f01e32653472b39af
3.6 MB Preview Download

Additional details

Related works

Is supplement to
Software: https://github.com/Xilinx/brevitas/tree/v0.12.0 (URL)

Software