Penguin-inspired active audition for robot navigation
Description
Penguin-inspired active audition for robot navigation
Abstract
Navigating toward specific acoustic targets in noisy environments is a fundamental capability for autonomous systems. While penguins locate their chicks using compact auditory systems coupled with active movement, most robotic platforms rely on passive microphone arrays that impose hardware trade-offs and depend on single acoustic representations with limited identity discrimination. Here we present a penguin-inspired active audition framework that replaces sensor multiplicity with embodied physical motion. Inspired by penguins decoding dual-syrinx beating vocalizations, our system combines a compact acoustic metastructure with TESSER (Temporal-Spectral Sound Encoding Representation), a framework that dynamically fuses spectral and temporal features. TESSER compresses representations to 0.0025% the size of conventional acoustic features while increasing inter-class separability margins from 0.08–0.11 in conventional representations to 0.62, enabling sub-second (<1 s) inference. The framework achieves identification accuracies of 98.38% for biological vocalizations and 93.05% across 100 human speakers, while simultaneously localize sound sources with 93.1% accuracy at 1° resolution. In closed-loop navigation under severe masking conditions, a single-sensor robot successfully tracked mobile biological targets (94.4% success), high-speed aerial drones (88.9%), and targeted human delivery tasks (83.7%). The proposed nature-inspired embodied active audition combined with the compact acoustic representation can support on autonomous systems to identify, localize, and navigate toward acoustic targets under severe acoustic masking.
📂 Deposition Contents & Structure
This repository contains the complete suite of hardware designs, ROS2 environments, AI model training scripts, and experimental datasets necessary to reproduce our findings. It is divided into four main components:
1. Penguin_Robot/
Contains the Docker environment, ROS2 workspaces, and hardware fabrication instructions for running the physical robot.
2. Evaluation_ID_Model/ (Fig. 4)
Contains Jupyter notebooks and datasets for training and evaluating the acoustic Identification (ID) models across Human, Manufacturing, and Penguin domains.
3. Evaluation_Spatial_Map/ (Fig. 5, S21, S22)
Contains datasets and notebooks for testing the Spatial Acoustic Map (ID + SSL) in multi-source environments (evaluating up to 6 speakers, with a maximum of 3 emitting simultaneously).
4. Autonomous_Navigation/ (Fig. 6, 7, Table 1)
Contains the final real-world experimental logs, performance metrics for the robot's autonomous navigation trials, and the drone tracking application code.
🛠️ Part 1: Hardware & System Setup
To run the physical robot or the ROS2 simulation, you must set up the distributed environment bridging a local compute machine (MacBook Pro M3) and the physical robot (Raspberry Pi 4B).
-
Robot Structure: The provided CAD (
.stepfile) includes the robot skeleton, acoustic meta-structure, and motor setup details. -
Compute Units: Raspberry Pi 4B (Ubuntu 22.04 LTS recommended) acting as the main robot compute unit, and a Teensy 4.0 embedded microcontroller for hardware processing.
-
Software: Requires Docker and Tailscale for VPN bridging.
-
AI Model (BEATs): Requires the
BEATs_iter3_plus_AS2M.ptcheckpoint from the Microsoft UniLM GitHub.
Workspace Setup (Raspberry Pi)
To set up the ROS2 environment on the physical robot, download the codebase from our Zenodo archive and build the workspace:
-
Download the Code: Download the
penguin_wsfolder from our Zenodo archive. -
Create the Workspace: On your Raspberry Pi, create a new directory for the workspace:
mkdir -p ~/penguin_ws/src -
Copy the Source Files: Copy the
penguin_body_pkgfolder (and any other source folders from the downloaded Zenodo archive) into the~/penguin_ws/src/directory. -
Build the Workspace: Compile the ROS2 packages and source the environment:
cd ~/penguin_ws colcon build --packages-select penguin_body_pkg source install/setup.bash
Execution Summary (Go README.md inside Penguin_Robot for more details):
-
Local Environment: Load the provided
ros_humble_3DAR.tarDocker image. Establish a Zenoh bridge (zenoh-bridge-ros2dds) to the robot's Tailscale IP, and launch the AI commander node (penguin_commander.py). -
Robot Hardware: Flash the Teensy 4.0 via Arduino IDE (
Penguin_teensy_quite.ino). On the Raspberry Pi, run the Zenoh listener, upload OpenCR firmware, launch theturtlebot3_rosnode, and execute the drive and audio nodes.
📊 Part 2: Model Evaluations
A. Identification Across Diverse Acoustic Domains
-
Generate Embeddings: Run
ID_{Domain}_dataset_loader.ipynbto process raw audio using BEATs and YAMNet. -
Hyperparameter Tuning: Run
ID_{Domain}_grid.ipynbto perform grid searches (outputs togrid_search_final_report.csv). -
Visualization: Run
ID_{Domain}_grid_result.ipynbto generate confusion matrices and embedding plots.
B. Spatial Acoustic Map in Multi-Source Environments
-
Specific quality filters were applied (e.g., 3000 Hz spectral centroid filter for Penguins; removal of overlapping background noise in Human datasets).
-
Extract
Real_dataset.zip, run the respective ID and SSL multi-source notebooks, and useVisualization_preprocess.ipynbto render the 2D spatial acoustic maps (or view the pre-computed CSVs provided).
🎯 Part 3: Real-World Autonomous Navigation & Drone Tracking
This section contains the final Excel (.xlsx) logs detailing the real-world trials, mapping directly to the manuscript's tables and figures.
Data Dictionary for Main Navigation Experiments:
| Column Name | Description |
| Trials | The sequential trial number of the experiment. |
| AI_Latency_ms | Combined inference time (ms) for ID + SSL models running on the local machine. |
| Time | Exact timestamp (YYYY.MM.DD HH:MM) of the trial. |
| Target ID | Designated ground-truth target sound source class name. |
| ID results | Predicted inference result of the ID model (assessing correct 60-degree sector identification). |
| SSL success | Human-evaluated outcome (O / X). Pass indicates the robot physically turned toward the target sound direction within an approximate ±15-degree error margin. |
| Success | Overall trial outcome (O / X). Strictly defined as a perfect match between Target ID and ID results, AND an O in SSL success. |
Drone Tracking Application:
In addition to the primary target identification experiments, the repository includes resources for testing the robot in a drone tracking scenario:
-
Drone_Application/: Contains the recorded dataset specifically collected for the drone tracking experiments. -
Drone_application.ipynb: Contains the code for training the ID and SSL models specifically adapted for the drone tracking application.
Target Delivery Application:
The repository also includes resources for testing the robot in a target delivery scenario directed by human speech:
Target_delivery_application/: Contains the recorded dataset and trained models specifically collected for the target delivery experiments.Target_delivery_applcation.ipynb: Contains the code for fine-tuning the ID and fully training the SSL models specifically adapted for target delivery via human speech.
Files
Autonomous navigation.zip
Files
(12.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:3178a167581f44f7ad03e1806a4e437a
|
178.6 MB | Preview Download |
|
md5:301a861c1bc187af813d5fc34ed86cc3
|
9.2 GB | Preview Download |
|
md5:ff6c95cd8b8974ec974836ade63aedb1
|
1.1 GB | Preview Download |
|
md5:b099ccec1f72ad4c327a975dad85cfc9
|
2.0 GB | Preview Download |
|
md5:253ff8c292099d94c5d8374eab3463ea
|
6.5 kB | Preview Download |
Additional details
Software
- Programming language
- Python