There is a newer version of the record available.

Published December 30, 2025 | Version zenodo2
Software Open

UNET Acceleration on FPGA

  • 1. ROR icon University of Ostrava

Description

UNET FPGA Acceleration

A comprehensive repository focused on the acceleration of U-Net segmentation models on FPGA platforms. It provides two fully functional deployment flows—one utilizing HLS4ML for PYNQ/Vivado targets and another leveraging Vitis AI (DPU) for Kria KV260. The project is designed for researchers and engineers comparing different FPGA deployment methodologies for medical or general image segmentation.

Key capabilities

  • Quantization-Aware Training (QAT): Integration with Brevitas for training FPGA-friendly quantized U-Net models.

  • HLS4ML Flow: Complete pipeline for accelerating U-Net on PYNQ-supported boards (e.g., FZ5) using High-Level Synthesis.

  • Vitis AI Flow: Deployment pipeline for Xilinx Kria KV260 using the Deep Learning Processor Unit (DPU).

  • Webapp Integration: Ready-to-use web applications for real-time demonstration of inference on both hardware targets.

  • Comparative Analysis: Architecture consistency across flows allows for direct performance, latency, and accuracy benchmarks between HLS4ML and Vitis AI.

Features & Scope

  • Multi-Platform Support: Specifically optimized for FZ5-class boards and Xilinx Kria KV260.

  • Automated Deployment: Includes shell scripts for remote packaging, SCP transfer, and board-side execution.

  • Extensible Research: Contains experimental "split model" partitioning attempts for further performance optimization on resource-constrained FPGAs.

  • Production-Ready Scripts: Includes DMA drivers, benchmarks, and data preparation utilities (color-based mask extraction).

What's included

  • Source Code: Core training logic (fz5_full_model/), HLS4ML configuration (hls4ml/), and Vitis AI compilation scripts (vitis_ai/).

  • Deployment Tools: Webapp source code for PYNQ and KV260, automated deployment scripts (deploy_fz5.sh, deploy.sh).

  • Archived Experiments: Historical PaddlePaddle conversion flow and model splitting experiments for reference.

  • Documentation: Comprehensive README with step-by-step guides for training, quantization, and hardware validation.

Minimum requirements

  • OS: Linux (recommended for toolchain compatibility).

  • Runtime: Python 3.7+; key dependencies: Brevitas, PyTorch, HLS4ML, OpenCV, NumPy.

  • Hardware: * Development: FPGA-ready workstation with Vivado / Vitis / Vitis-AI toolchains installed.

    • Deployment: PYNQ-compatible board (FZ5) or Xilinx Kria KV260.

  • Network: SSH/SCP access to the target boards.

How to run

  1. Training: Navigate to fz5_full_model/, install requirements.txt, and run python train.py to generate quantized weights.

  2. HLS4ML Deployment: Use deploy_fz5.sh to package the model, then run the inference scripts or the webapp on the FZ5 board.

  3. Vitis AI Deployment: Use scripts in vitis_ai/ (from 0_... to 5_...) to quantize, compile, and deploy the .xmodel to the KV260.

  4. Webapp: Run python3 app.py within the respective deployment folders on the target hardware.

How to cite

Cite the Zenodo DOI for this version.

Authors: Denis Kurka, Petr Čermák

Files

denisuskurka/UNET_ACCEL-zenodo2.zip

Files (63.4 MB)

Name Size Download all
md5:b84879a7355746b466e64f9490ad02f7
63.4 MB Preview Download

Additional details

Related works

References

  • TensorFlow: Abadi, M., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org
  • Keras: Chollet, F., et al. (2015). Keras: The Python Deep Learning API. https://keras.io
  • U-Net (Original Paper): Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv:1505.04597. https://doi.org/10.48550/arXiv.1505.04597
  • FINN (Original Paper): Umuroglu, Y., et al. (2017). FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17), 65–74. https://doi.org/10.1145/3020078.3021744
  • FINN-R (Journal Paper): Blott, M., et al. (2018). FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(3), 1–23. https://doi.org/10.1145/3242897