Published April 9, 2026 | Version v1
Conference paper Open

Spatialized MotionSoundLocalization: A Self-Supervised Multimodal Approach to Camera Rotation and Sound Source Localization Simulation

Authors/Creators

Description

When a person turns their head, two things happen at once: the visual scene shifts, and sounds appear to come from a different direction. This paper explores whether a machine can learn to understand this relationship on its own, without any human-labeled data. We train two models together — one that estimates how much a camera has rotated between two images, and one that estimates where a sound is coming from using two-channel audio. By linking these two models through a shared geometric rule, they teach each other during training. To support this research, we also built a new synthetic dataset of 50,000 audio-visual pairs recorded in simulated indoor environments.

Files

SMSL Paper Improved Final.pdf

Files (4.3 MB)

Name Size Download all
md5:6ec88cf0128932b6e1a31665ffcbed85
4.3 MB Preview Download

Additional details

Software

Repository URL
https://github.com/deBrian07/SMSL
Programming language
Python , Shell