Published September 5, 2025
| Version 0.1
Dataset
Open
IS3+
Contributors
Annotator:
Researcher:
Supervisors:
Description
IS3+ is an extended version of IS3 with clean audio/image pairs to ensure cross-modality consistency. The dataset has 4 GB of data.
The dataset contains the following data:
- audio_wav: audio files (.wav)
- gt_segmentation: annotations of image bounding boxes and segmentation masks
- images: images (.jpg)
- IS3_annotation.json: file with image/audio/gt information for every dataset sample.
This work was done as part of the paper Learning from Silence and Noise for Visual Sound Source Localization Models.
Paper citation:
@misc{juanola2025learningsilencenoisevisual, title={Learning from Silence and Noise for Visual Sound Source Localization}, author={Xavier Juanola and Giovana Morais and Magdalena Fuentes and Gloria Haro}, year={2025}, eprint={2508.21761}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2508.21761}, }
Files
is3plus.zip
Files
(3.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:ede4b9d01f100fb286c211fab2c52d43
|
3.8 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/xavijuanola/SSL_SaN
- Programming language
- Python