UNET Acceleration on FPGA
Description
UNET FPGA Acceleration
A comprehensive repository focused on the acceleration of U-Net segmentation models on FPGA platforms. It provides two fully functional deployment flows—one utilizing HLS4ML for PYNQ/Vivado targets and another leveraging Vitis AI (DPU) for Kria KV260. The project is designed for researchers and engineers comparing different FPGA deployment methodologies for medical or general image segmentation.
Key capabilities
-
Quantization-Aware Training (QAT): Integration with Brevitas for training FPGA-friendly quantized U-Net models.
-
HLS4ML Flow: Complete pipeline for accelerating U-Net on PYNQ-supported boards (e.g., FZ5) using High-Level Synthesis.
-
Vitis AI Flow: Deployment pipeline for Xilinx Kria KV260 using the Deep Learning Processor Unit (DPU).
-
Webapp Integration: Ready-to-use web applications for real-time demonstration of inference on both hardware targets.
-
Comparative Analysis: Architecture consistency across flows allows for direct performance, latency, and accuracy benchmarks between HLS4ML and Vitis AI.
Features & Scope
-
Multi-Platform Support: Specifically optimized for FZ5-class boards and Xilinx Kria KV260.
-
Automated Deployment: Includes shell scripts for remote packaging, SCP transfer, and board-side execution.
-
Extensible Research: Contains experimental "split model" partitioning attempts for further performance optimization on resource-constrained FPGAs.
-
Production-Ready Scripts: Includes DMA drivers, benchmarks, and data preparation utilities (color-based mask extraction).
What's included
-
Source Code: Core training logic (
fz5_full_model/), HLS4ML configuration (hls4ml/), and Vitis AI compilation scripts (vitis_ai/). -
Deployment Tools: Webapp source code for PYNQ and KV260, automated deployment scripts (
deploy_fz5.sh,deploy.sh). -
Archived Experiments: Historical PaddlePaddle conversion flow and model splitting experiments for reference.
-
Documentation: Comprehensive README with step-by-step guides for training, quantization, and hardware validation.
Minimum requirements
-
OS: Linux (recommended for toolchain compatibility).
-
Runtime: Python 3.7+; key dependencies: Brevitas, PyTorch, HLS4ML, OpenCV, NumPy.
-
Hardware: * Development: FPGA-ready workstation with Vivado / Vitis / Vitis-AI toolchains installed.
-
Deployment: PYNQ-compatible board (FZ5) or Xilinx Kria KV260.
-
-
Network: SSH/SCP access to the target boards.
How to run
-
Training: Navigate to
fz5_full_model/, installrequirements.txt, and runpython train.pyto generate quantized weights. -
HLS4ML Deployment: Use
deploy_fz5.shto package the model, then run the inference scripts or the webapp on the FZ5 board. -
Vitis AI Deployment: Use scripts in
vitis_ai/(from0_...to5_...) to quantize, compile, and deploy the.xmodelto the KV260. -
Webapp: Run
python3 app.pywithin the respective deployment folders on the target hardware.
How to cite
Cite the Zenodo DOI for this version.
Authors: Denis Kurka, Petr Čermák
Files
denisuskurka/UNET_ACCEL-zenodo2.zip
Files
(63.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b84879a7355746b466e64f9490ad02f7
|
63.4 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/denisuskurka/UNET_ACCEL/tree/zenodo2 (URL)
Software
- Repository URL
- https://github.com/denisuskurka/UNET_ACCEL
References
- TensorFlow: Abadi, M., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org
- Keras: Chollet, F., et al. (2015). Keras: The Python Deep Learning API. https://keras.io
- U-Net (Original Paper): Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv:1505.04597. https://doi.org/10.48550/arXiv.1505.04597
- FINN (Original Paper): Umuroglu, Y., et al. (2017). FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17), 65–74. https://doi.org/10.1145/3020078.3021744
- FINN-R (Journal Paper): Blott, M., et al. (2018). FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(3), 1–23. https://doi.org/10.1145/3242897