Enhanced Multi-Modal UAV Perception using Large Language Models for Autonomous Disaster Reconnaissance

Bhavya Keerthi K; Moumita Mandal

doi:10.5281/zenodo.20442636

Published May 29, 2026 | Version v1

Preprint Open

Enhanced Multi-Modal UAV Perception using Large Language Models for Autonomous Disaster Reconnaissance

1. Vellore Institute of Technology University

This paper presents a sophisticated multi-modal UAV perception framework that integrates LiDAR and Optical Flow Fusion (LOFF) odometry, YOLOv8-based semantic perception, and a narration layer powered by a Large Language Model (LLM) for autonomous disaster reconnaissance in simulation oriented environments.

The proposed framework combines geometric localization, semantic scene understanding, and contextual reasoning to enable intelligent navigation in areas devoid of GPS. LOFF merges the alignment of LiDAR point clouds with optical flow estimation through Factor Graph Optimization (FGO) to achieve robust and drift-minimized pose estimation, while YOLOv8 performs real-time object detection and semantic scene analysis.

To improve interpretability, the framework includes an LLM-based narration module that converts structured UAV perception outputs into human-readable situational intelligence, encompassing hazard alerts and navigation recommendations.

The framework is implemented using ROS, Gazebo, and ArduPilot SITL to facilitate synchronized multi-modal sensor integration and realistic UAV simulation. Experimental evaluations reveal localization drift below 0.15 m over a 100 m flight path, YOLOv8 person detection accuracy of 95.4% with 92.8% recall at an average inference time of 37 ms per frame, and obstacle detection precision exceeding 96%, validating the feasibility of integrating contextual language-based reasoning within perception-driven UAV systems.

Files

Enhanced_Multi_Modal_UAV_Perception_using_Large_Language_Models_for_Autonomous_Disaster_Reconnaissance.pdf

Files (795.8 kB)

Name	Size	Download all
Enhanced_Multi_Modal_UAV_Perception_using_Large_Language_Models_for_Autonomous_Disaster_Reconnaissance.pdf md5:14d348e07576ef1bf2d6ba7ee3174225	795.8 kB	Preview Download

	All versions	This version
Views	38	38
Downloads	33	33
Data volume	44.6 MB	44.6 MB

Enhanced Multi-Modal UAV Perception using Large Language Models for Autonomous Disaster Reconnaissance

Authors/Creators

Description

Files

Enhanced_Multi_Modal_UAV_Perception_using_Large_Language_Models_for_Autonomous_Disaster_Reconnaissance.pdf

Files (795.8 kB)