Published April 7, 2025 | Version v1
Model Open

Fusing Events and Frames with Self-Attention Network for Ball Collision Detection

  • 1. ROR icon ETH Zurich

Description

Ensuring robust and real-time obstacle avoidance is critical for the safe operation of autonomous robots in dynamic, real-world environments. 
This paper proposes a neural network framework for predicting the collision position and time of an unmanned aerial vehicle with a dynamic object, using only RGB and event-based vision sensors.
The proposed architecture consists of two separate encoder branches, one for each modality, followed by fusion by self-attention to improve prediction accuracy.
To facilitate benchmarking, we introduce a multi-modal dataset that enables detailed comparisons of single-modality and fusion-based approaches.
At the same prediction throughput of 50Hz, the experimental results show that the fusion-based model offers an improvement in prediction accuracy over single-modality approaches of 1\% on average and 10\% for distances beyond 0.5m, but comes at the cost of +71\% in memory and + 105\% in FLOPs. Notably, the event-based model outperforms the RGB model by 4\% for position and 26\% for time error at a similar computational cost, making it a competitive alternative.
Additionally, we evaluate quantized versions of the event-based models, applying 1- to 8-bit quantization to assess the trade-offs between predictive performance and computational efficiency.
These findings highlight the potential of multi-modal perception using RGB and event-based cameras in robotic applications.

Files

models.zip

Files (132.0 MB)

Name Size Download all
md5:b3ed4d48b23fb5ae9a3c46c733a15ac5
132.0 MB Preview Download

Additional details

Additional titles

Alternative title
Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone

Dates

Accepted
2025-06