UNET Acceleration on FPGA

Kurka, Denis; Čermák, Petr

doi:10.5281/zenodo.18095058

Published December 30, 2025 | Version zenodo2

Software Open

UNET Acceleration on FPGA

1. University of Ostrava

UNET FPGA Acceleration

A comprehensive repository focused on the acceleration of U-Net segmentation models on FPGA platforms. It provides two fully functional deployment flows—one utilizing HLS4ML for PYNQ/Vivado targets and another leveraging Vitis AI (DPU) for Kria KV260. The project is designed for researchers and engineers comparing different FPGA deployment methodologies for medical or general image segmentation.

Key capabilities

Quantization-Aware Training (QAT): Integration with Brevitas for training FPGA-friendly quantized U-Net models.
HLS4ML Flow: Complete pipeline for accelerating U-Net on PYNQ-supported boards (e.g., FZ5) using High-Level Synthesis.
Vitis AI Flow: Deployment pipeline for Xilinx Kria KV260 using the Deep Learning Processor Unit (DPU).
Webapp Integration: Ready-to-use web applications for real-time demonstration of inference on both hardware targets.
Comparative Analysis: Architecture consistency across flows allows for direct performance, latency, and accuracy benchmarks between HLS4ML and Vitis AI.

Features & Scope

Multi-Platform Support: Specifically optimized for FZ5-class boards and Xilinx Kria KV260.
Automated Deployment: Includes shell scripts for remote packaging, SCP transfer, and board-side execution.
Extensible Research: Contains experimental "split model" partitioning attempts for further performance optimization on resource-constrained FPGAs.
Production-Ready Scripts: Includes DMA drivers, benchmarks, and data preparation utilities (color-based mask extraction).

What's included

Source Code: Core training logic (fz5_full_model/), HLS4ML configuration (hls4ml/), and Vitis AI compilation scripts (vitis_ai/).
Deployment Tools: Webapp source code for PYNQ and KV260, automated deployment scripts (deploy_fz5.sh, deploy.sh).
Archived Experiments: Historical PaddlePaddle conversion flow and model splitting experiments for reference.
Documentation: Comprehensive README with step-by-step guides for training, quantization, and hardware validation.

Minimum requirements

OS: Linux (recommended for toolchain compatibility).
Runtime: Python 3.7+; key dependencies: Brevitas, PyTorch, HLS4ML, OpenCV, NumPy.
Hardware: * Development: FPGA-ready workstation with Vivado / Vitis / Vitis-AI toolchains installed.
- Deployment: PYNQ-compatible board (FZ5) or Xilinx Kria KV260.
Network: SSH/SCP access to the target boards.

How to run

Training: Navigate to fz5_full_model/, install requirements.txt, and run python train.py to generate quantized weights.
HLS4ML Deployment: Use deploy_fz5.sh to package the model, then run the inference scripts or the webapp on the FZ5 board.
Vitis AI Deployment: Use scripts in vitis_ai/ (from 0_... to 5_...) to quantize, compile, and deploy the .xmodel to the KV260.
Webapp: Run python3 app.py within the respective deployment folders on the target hardware.

How to cite

Cite the Zenodo DOI for this version.

Authors: Denis Kurka, Petr Čermák

Files

denisuskurka/UNET_ACCEL-zenodo2.zip

Files (63.4 MB)

Name	Size	Download all
denisuskurka/UNET_ACCEL-zenodo2.zip md5:b84879a7355746b466e64f9490ad02f7	63.4 MB	Preview Download

Additional details

Is supplement to: Software: https://github.com/denisuskurka/UNET_ACCEL/tree/zenodo2 (URL)

Repository URL: https://github.com/denisuskurka/UNET_ACCEL

TensorFlow: Abadi, M., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org
Keras: Chollet, F., et al. (2015). Keras: The Python Deep Learning API. https://keras.io
U-Net (Original Paper): Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv:1505.04597. https://doi.org/10.48550/arXiv.1505.04597
FINN (Original Paper): Umuroglu, Y., et al. (2017). FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17), 65–74. https://doi.org/10.1145/3020078.3021744
FINN-R (Journal Paper): Blott, M., et al. (2018). FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(3), 1–23. https://doi.org/10.1145/3242897

	All versions	This version
Views	189	115
Downloads	6	2
Data volume	425.6 MB	126.9 MB

UNET FPGA Acceleration

Key capabilities

Features & Scope

What's included

Minimum requirements

How to run

How to cite

denisuskurka/UNET_ACCEL-zenodo2.zip

Files (63.4 MB)

Related works

Software

References

UNET Acceleration on FPGA

Authors/Creators

Description

UNET FPGA Acceleration

Key capabilities

Features & Scope

What's included

Minimum requirements

How to run

How to cite

Files

denisuskurka/UNET_ACCEL-zenodo2.zip

Files (63.4 MB)

Additional details

Related works

Software

References