Published April 9, 2026
| Version v1
Conference paper
Open
Spatialized MotionSoundLocalization: A Self-Supervised Multimodal Approach to Camera Rotation and Sound Source Localization Simulation
Authors/Creators
Description
When a person turns their head, two things happen at once: the visual scene shifts, and sounds appear to come from a different direction. This paper explores whether a machine can learn to understand this relationship on its own, without any human-labeled data. We train two models together — one that estimates how much a camera has rotated between two images, and one that estimates where a sound is coming from using two-channel audio. By linking these two models through a shared geometric rule, they teach each other during training. To support this research, we also built a new synthetic dataset of 50,000 audio-visual pairs recorded in simulated indoor environments.
Files
SMSL Paper Improved Final.pdf
Files
(4.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6ec88cf0128932b6e1a31665ffcbed85
|
4.3 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/deBrian07/SMSL
- Programming language
- Python , Shell