Published 2026 | Version v1
Software Open

Penguin-inspired active audition for robot navigation

  • 1. EDMO icon Seoul National University

Description

Penguin-inspired active audition for robot navigation

Abstract

Navigating toward specific acoustic targets in noisy environments is a fundamental capability for autonomous systems. While penguins locate their chicks using compact auditory systems coupled with active movement, most robotic platforms rely on passive microphone arrays that impose hardware trade-offs and depend on single acoustic representations with limited identity discrimination. Here we present a penguin-inspired active audition framework that replaces sensor multiplicity with embodied physical motion. Inspired by penguins decoding dual-syrinx beating vocalizations, our system combines a compact acoustic metastructure with TESSER (Temporal-Spectral Sound Encoding Representation), a framework that dynamically fuses spectral and temporal features. TESSER compresses representations to 0.0025% the size of conventional acoustic features while increasing inter-class separability margins from 0.08–0.11 in conventional representations to 0.62, enabling sub-second (<1 s) inference. The framework achieves identification accuracies of 98.38% for biological vocalizations and 93.05% across 100 human speakers, while simultaneously localize sound sources with 93.1% accuracy at 1° resolution. In closed-loop navigation under severe masking conditions, a single-sensor robot successfully tracked mobile biological targets (94.4% success), high-speed aerial drones (88.9%), and targeted human delivery tasks (83.7%). The proposed nature-inspired embodied active audition combined with the compact acoustic representation can support on autonomous systems to identify, localize, and navigate toward acoustic targets under severe acoustic masking.

📂 Deposition Contents & Structure

This repository contains the complete suite of hardware designs, ROS2 environments, AI model training scripts, and experimental datasets necessary to reproduce our findings. It is divided into four main components:

1. Penguin_Robot/

Contains the Docker environment, ROS2 workspaces, and hardware fabrication instructions for running the physical robot.

2. Evaluation_ID_Model/ (Fig. 4)

Contains Jupyter notebooks and datasets for training and evaluating the acoustic Identification (ID) models across Human, Manufacturing, and Penguin domains.

3. Evaluation_Spatial_Map/ (Fig. 5, S21, S22)

Contains datasets and notebooks for testing the Spatial Acoustic Map (ID + SSL) in multi-source environments (evaluating up to 6 speakers, with a maximum of 3 emitting simultaneously).

4. Autonomous_Navigation/ (Fig. 6, 7, Table 1)

Contains the final real-world experimental logs, performance metrics for the robot's autonomous navigation trials, and the drone tracking application code.

🛠️ Part 1: Hardware & System Setup

To run the physical robot or the ROS2 simulation, you must set up the distributed environment bridging a local compute machine (MacBook Pro M3) and the physical robot (Raspberry Pi 4B).

  • Robot Structure: The provided CAD (.step file) includes the robot skeleton, acoustic meta-structure, and motor setup details.

  • Compute Units: Raspberry Pi 4B (Ubuntu 22.04 LTS recommended) acting as the main robot compute unit, and a Teensy 4.0 embedded microcontroller for hardware processing.

  • Software: Requires Docker and Tailscale for VPN bridging.

  • AI Model (BEATs): Requires the BEATs_iter3_plus_AS2M.pt checkpoint from the Microsoft UniLM GitHub.

Workspace Setup (Raspberry Pi)

To set up the ROS2 environment on the physical robot, download the codebase from our Zenodo archive and build the workspace:

  1. Download the Code: Download the penguin_ws folder from our Zenodo archive.

  2. Create the Workspace: On your Raspberry Pi, create a new directory for the workspace:

    mkdir -p ~/penguin_ws/src
    
  3. Copy the Source Files: Copy the penguin_body_pkg folder (and any other source folders from the downloaded Zenodo archive) into the ~/penguin_ws/src/ directory.

  4. Build the Workspace: Compile the ROS2 packages and source the environment:

    cd ~/penguin_ws
    colcon build --packages-select penguin_body_pkg
    source install/setup.bash

Execution Summary (Go README.md inside Penguin_Robot for more details):

  1. Local Environment: Load the provided ros_humble_3DAR.tar Docker image. Establish a Zenoh bridge (zenoh-bridge-ros2dds) to the robot's Tailscale IP, and launch the AI commander node (penguin_commander.py).

  2. Robot Hardware: Flash the Teensy 4.0 via Arduino IDE (Penguin_teensy_quite.ino). On the Raspberry Pi, run the Zenoh listener, upload OpenCR firmware, launch the turtlebot3_ros node, and execute the drive and audio nodes.

📊 Part 2: Model Evaluations

A. Identification Across Diverse Acoustic Domains

  • Generate Embeddings: Run ID_{Domain}_dataset_loader.ipynb to process raw audio using BEATs and YAMNet.

  • Hyperparameter Tuning: Run ID_{Domain}_grid.ipynb to perform grid searches (outputs to grid_search_final_report.csv).

  • Visualization: Run ID_{Domain}_grid_result.ipynb to generate confusion matrices and embedding plots.

B. Spatial Acoustic Map in Multi-Source Environments

  • Specific quality filters were applied (e.g., 3000 Hz spectral centroid filter for Penguins; removal of overlapping background noise in Human datasets).

  • Extract Real_dataset.zip, run the respective ID and SSL multi-source notebooks, and use Visualization_preprocess.ipynb to render the 2D spatial acoustic maps (or view the pre-computed CSVs provided).

🎯 Part 3: Real-World Autonomous Navigation & Drone Tracking

This section contains the final Excel (.xlsx) logs detailing the real-world trials, mapping directly to the manuscript's tables and figures.

Data Dictionary for Main Navigation Experiments:

Column Name Description
Trials The sequential trial number of the experiment.
AI_Latency_ms Combined inference time (ms) for ID + SSL models running on the local machine.
Time Exact timestamp (YYYY.MM.DD HH:MM) of the trial.
Target ID Designated ground-truth target sound source class name.
ID results Predicted inference result of the ID model (assessing correct 60-degree sector identification).
SSL success Human-evaluated outcome (O / X). Pass indicates the robot physically turned toward the target sound direction within an approximate ±15-degree error margin.
Success Overall trial outcome (O / X). Strictly defined as a perfect match between Target ID and ID results, AND an O in SSL success.

Drone Tracking Application:

In addition to the primary target identification experiments, the repository includes resources for testing the robot in a drone tracking scenario:

  • Drone_Application/: Contains the recorded dataset specifically collected for the drone tracking experiments.

  • Drone_application.ipynb: Contains the code for training the ID and SSL models specifically adapted for the drone tracking application.

Target Delivery Application:

The repository also includes resources for testing the robot in a target delivery scenario directed by human speech:

  • Target_delivery_application/: Contains the recorded dataset and trained models specifically collected for the target delivery experiments.
  • Target_delivery_applcation.ipynb: Contains the code for fine-tuning the ID and fully training the SSL models specifically adapted for target delivery via human speech.

Files

Autonomous navigation.zip

Files (12.5 GB)

Name Size Download all
md5:3178a167581f44f7ad03e1806a4e437a
178.6 MB Preview Download
md5:301a861c1bc187af813d5fc34ed86cc3
9.2 GB Preview Download
md5:ff6c95cd8b8974ec974836ade63aedb1
1.1 GB Preview Download
md5:b099ccec1f72ad4c327a975dad85cfc9
2.0 GB Preview Download
md5:253ff8c292099d94c5d8374eab3463ea
6.5 kB Preview Download

Additional details

Software

Programming language
Python