Published October 10, 2024 | Version v0.11.0
Software Open

Xilinx/brevitas: Release v0.11.0

  • 1. @Apple
  • 2. @AMD
  • 3. AMD
  • 4. DeepRender
  • 5. Zama.ai
  • 6. @Quansight
  • 7. @AMD Research Labs
  • 8. @Point72
  • 9. KRI @ Northeastern University
  • 10. National University of Science and Technology
  • 11. unpaired.
  • 12. @zama-ai
  • 13. UC San Diego
  • 14. Paderborn University

Description

Breaking Changes

  • Remove ONNX QOp export (https://github.com/Xilinx/brevitas/pull/917)
  • QuantTensor cannot have empty metadata fields (e.g., scale, bitwidth, etc.) (https://github.com/Xilinx/brevitas/pull/819)
  • Bias quantization now requires the specification of bit-width (https://github.com/Xilinx/brevitas/pull/839)
  • QuantLayers do not expose quant_metadata directly. This is delegated to the proxies (https://github.com/Xilinx/brevitas/pull/883)
  • QuantDropout has been removed (https://github.com/Xilinx/brevitas/pull/861)
  • QuantMaxPool has been removed (https://github.com/Xilinx/brevitas/pull/858)

Highlights

  • Support for OCP/FNUZ FP8 quantization

    • Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
    • Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, etc.)
    • Support for ONNX QDQ export
  • Support for OCP MX Quantization

    • Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
    • Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, group size, etc.)
  • New QuantTensor supports:

    • FloatQuantTensor: supports OCP FP formats and general minifloat quantization
    • GroupwiseQuantTensor: supports for OCP MX formats and general groupwise int/minifloat quantization
  • Support for Channel splitting

  • Support for HQO optimization for zero point

  • Support for HQO optimization for scale (prototype)

  • Improved SDXL entrypoint under brevitas_examples

  • Improved LLM entrypoint under brevitas_examples

    • Compatibility with accelerate
  • Prototype support for torch.compile:

    • Check PR https://github.com/Xilinx/brevitas/pull/1006 for an example on how to use it

What's Changed

For a more comprehensive list of changes and fix, check the list below:

  • Enhance: Importing quantized models after bias correction by @costigt-dev in https://github.com/Xilinx/brevitas/pull/868
  • Fix QCDQDecoupledWeightQuantProxyHandlerMixin return args by @costigt-dev in https://github.com/Xilinx/brevitas/pull/870
  • Fix - Speech to text: Create an empty json file by @costigt-dev in https://github.com/Xilinx/brevitas/pull/871
  • Feat (scaling/standalone): flag to retrieve full state dict by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/874
  • Notebooks: makes notebooks deterministic and prints output of asserts by @fabianandresgrob in https://github.com/Xilinx/brevitas/pull/847
  • Fix (proxy): revert value tracer change by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/888
  • Fix (proxy): fix for attributes retrieval by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/880
  • Feat (notebook): add example for dynamic quantization to ONNX export by @fabianandresgrob in https://github.com/Xilinx/brevitas/pull/877
  • Fix (gpxq): handling empty tensors with GPxQ and adding unit tests by @i-colbert in https://github.com/Xilinx/brevitas/pull/892
  • Fix (ptq): expose uint_sym_act flag and fix issue with minifloat sign by @fabianandresgrob in https://github.com/Xilinx/brevitas/pull/898
  • Feat (minifloat): add support for user specified minifloat format by @fabianandresgrob in https://github.com/Xilinx/brevitas/pull/821
  • Feat: Add QuantConv3d and QuantConv3dTranspose by @costigt-dev in https://github.com/Xilinx/brevitas/pull/805
  • Add tutorial examples of per-channel quantization by @OscarSavolainenDR in https://github.com/Xilinx/brevitas/pull/867
  • Fix (tests): revert pytest pin by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/903
  • Remove: Remove original_cat workaround by @costigt-dev in https://github.com/Xilinx/brevitas/pull/902
  • Infra: Update issue template by @nickfraser in https://github.com/Xilinx/brevitas/pull/893
  • Pull Request Template by @capnramses in https://github.com/Xilinx/brevitas/pull/885
  • Fix (core): add return in state_dict by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/910
  • Fix (quant_tensor): fix typing and remove unused checks by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/913
  • Fix (nn): removed unused caching in adaptive avgpool2d by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/911
  • Fix (quant_tensor): remove unused checks by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/918
  • Setup: pin ONNX to 1.15 due to ORT incompatibility by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/924
  • Feat (examples): add support for Stable Diffusion XL by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/909
  • Assert all ptq-common bit widths are positive integers by @OscarSavolainenDR in https://github.com/Xilinx/brevitas/pull/931
  • Enhance: Quant Tensor Test by @costigt-dev in https://github.com/Xilinx/brevitas/pull/894
  • Fix (examples/stable_diffusion): README formatting and clarification by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/932
  • Fix (examples/ptq): fix for bitwidth check by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/934
  • Feat: functionalize QuantTensor by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/878
  • Feat (minifloat): cleanup minifloat impl by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/922
  • Fix tests in dev by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/939
  • Feat (proxy): scale computation delegated to bias proxy by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/938
  • Fix (gpxq): adding input quant to process input by @i-colbert in https://github.com/Xilinx/brevitas/pull/943
  • Fix (quant): propagate device and dtype in subinjector by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/942
  • Fix (gpxq): correct variable name by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/944
  • Fix (quant_tensor): fix AvgPool functional implementation by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/945
  • Feat (quant_tensor): support for dim() and ndim by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/947
  • Fix (graph/standardize): correct check for Mean to AvgPool by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/948
  • Feat (graph/standardize): default keepdim value by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/950
  • Fix bullet formatting in getting started guide by @timkpaine in https://github.com/Xilinx/brevitas/pull/952
  • Fix (quant/float): correct scaling_impl and float_scaling_impl by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/953
  • Fix/remove-numel - Remove numel is zero check from context manager exit method by @costigt-dev in https://github.com/Xilinx/brevitas/pull/920
  • Feat (examples/ptq): support for dynamic act quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/935
  • Feat (quant_tensor): support for FloatQuantTensor by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/919
  • Fix (examples/llm): Add all rewriters to the list by @nickfraser in https://github.com/Xilinx/brevitas/pull/956
  • Fix (core/quant/float): use eps to avoid log(0) by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/957
  • Fix (test/actions): Excluded torch==1.9.1, platform=macos-latest tests by @nickfraser in https://github.com/Xilinx/brevitas/pull/960
  • Adding FP8 weight export by @costigt-dev in https://github.com/Xilinx/brevitas/pull/907
  • Fix (llm): fix device issue for eval when not using default device by @fabianandresgrob in https://github.com/Xilinx/brevitas/pull/949
  • Fix (GPFQ): using random projection for speed up/less memory usage by @fabianandresgrob in https://github.com/Xilinx/brevitas/pull/964
  • Fix (calibrate/minifloat): fix for act calibration by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/966
  • Fix (quant/float): restore fix for log(0) by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/968
  • Setup: pin numpy version by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/974
  • Feat (minifloat): support for FNUZ variants by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/973
  • Fix (core/float): add default for float_scaling_impl by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/972
  • Feat (graph/equalize): upcast during equalization computation by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/970
  • Generative improv by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/965
  • Fix (requirements/setuptools): Set maximum requirement for setuptools by @nickfraser in https://github.com/Xilinx/brevitas/pull/963
  • Fix: Typo fix on SDXL command line args by @nickfraser in https://github.com/Xilinx/brevitas/pull/976
  • Fix (graph/bias_correction): Fix when layer parameters are offloaded to accelerate by @nickfraser in https://github.com/Xilinx/brevitas/pull/962
  • Fix (ptq/bias_correction): remove unnecessary forward pass by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/980
  • Fix (export/qonnx): Fixed symbolic kwargs order. by @nickfraser in https://github.com/Xilinx/brevitas/pull/988
  • Various SDXL quantization fixes by @nickfraser in https://github.com/Xilinx/brevitas/pull/977
  • Fix (brevitas_examples/sdxl): Various fixes by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/991
  • Feat (proxy/parameter_quant): cache quant weights by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/990
  • Docs: Added 0.10.3 release note to README. by @nickfraser in https://github.com/Xilinx/brevitas/pull/993
  • Added some preliminary unit tests to the CNNs 'quantize_model' by @OscarSavolainenDR in https://github.com/Xilinx/brevitas/pull/927
  • Feat (tests): extended minifloat unit tests by @alexredd99 in https://github.com/Xilinx/brevitas/pull/979
  • Fix (proxy/runtime_quant): correct handling of mixed type quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/985
  • docs (readme): Fixed GH actions badges by @nickfraser in https://github.com/Xilinx/brevitas/pull/996
  • Feat: Update LLM entry-point by @nickfraser in https://github.com/Xilinx/brevitas/pull/987
  • Feat: Support for Groupwise (MX) quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/971
  • Feat(graph): better exclusion mechanism by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1003
  • Fix (mx): input view during quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1005
  • Feat (mx): adding padding and transposed support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1007
  • fix (nn/avg_pool): Fix for trunc quant not being applied by @nickfraser in https://github.com/Xilinx/brevitas/pull/1014
  • Fix (nn/conv): Fixed conversion of convolutions when padding_mode='same' by @nickfraser in https://github.com/Xilinx/brevitas/pull/1017
  • Fix (proxy): clean-up by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1011
  • Feat (mx): automatic group_dim in layerwise quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1012
  • fix (nn/conv): Fix regression introduced in #1017 by @nickfraser in https://github.com/Xilinx/brevitas/pull/1019
  • Feat (mx): gptq compatibility and quant tests by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1013
  • Feat (mx): PTQ MX + Float support by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1010
  • Fix (graph/quant): Bugfix in blacklist matching in find_module by @nickfraser in https://github.com/Xilinx/brevitas/pull/1021
  • Test (graph/quantize) Added extra prefix test to layerwise_quantize by @nickfraser in https://github.com/Xilinx/brevitas/pull/1022
  • Test (example/llm): Refactor and add basic tests for the LLM entry-point by @nickfraser in https://github.com/Xilinx/brevitas/pull/1002
  • Feat (examples/sdxl): Updates to SDXL entry-point by @nickfraser in https://github.com/Xilinx/brevitas/pull/1020
  • Feat (gptq): optimizing CPU to GPU memory transfer by @i-colbert in https://github.com/Xilinx/brevitas/pull/1009
  • notebooks: rerun notebooks. by @nickfraser in https://github.com/Xilinx/brevitas/pull/1026
  • HQO for scale/zero point by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/937
  • Test calibration reference by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1031
  • Fix (sdxl): avoid suppressing checkpoint errors by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1034
  • Setup: update checkout version to v3 by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1039
  • Fix (gpxq): correct index for groupwise GPxQ by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1040
  • Fix (llm): small fixes to LLM by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1035
  • Feat (activation_calibration): speed-up by skipping quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1029
  • Fix (proxy): fix for float quant properties is_ocp and is_fnuz by @alexredd99 in https://github.com/Xilinx/brevitas/pull/1028
  • Decoupled PerChannel/PerTensor quantization by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1025
  • Fix (examples/llm): Fix infinite loop in LLM entrypoint with WikiText2 by @pablomlago in https://github.com/Xilinx/brevitas/pull/1044
  • Fix po2 for float quant by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1033
  • Feat (gpfq): adding memory-efficient formulation by @i-colbert in https://github.com/Xilinx/brevitas/pull/1043
  • Fix weights mse by @Giuseppe5 in https://github.com/Xilinx/brevitas/pull/1047

New Contributors

  • @OscarSavolainenDR made their first contribution in https://github.com/Xilinx/brevitas/pull/861
  • @costigt-dev made their first contribution in https://github.com/Xilinx/brevitas/pull/868
  • @timkpaine made their first contribution in https://github.com/Xilinx/brevitas/pull/952
  • @alexredd99 made their first contribution in https://github.com/Xilinx/brevitas/pull/979
  • @pablomlago made their first contribution in https://github.com/Xilinx/brevitas/pull/1044

Full Changelog: https://github.com/Xilinx/brevitas/compare/v0.10.2...v0.11.0

Files

Xilinx/brevitas-v0.11.0.zip

Files (3.4 MB)

Name Size Download all
md5:8c7206f8b872bbe9e4c5f9f8fd5fb875
3.4 MB Preview Download

Additional details

Related works

Is supplement to
Software: https://github.com/Xilinx/brevitas/tree/v0.11.0 (URL)