IS3+

Morais, Giovana; Juanola Molet, Xavier

doi:10.5281/zenodo.17064608

Published September 5, 2025 | Version 0.1

Dataset Open

IS3+

1. New York University

Contributors

Annotator:

Morais, Giovana¹

Researcher:

Juanola Molet, Xavier

Supervisor (2):

1. New York University
2. Pompeu Fabra University

IS3+ is an extended version of IS3 with clean audio/image pairs to ensure cross-modality consistency. The dataset has 4 GB of data.

The dataset contains the following data:

audio_wav: audio files (.wav)
gt_segmentation: annotations of image bounding boxes and segmentation masks
images: images (.jpg)
IS3_annotation.json: file with image/audio/gt information for every dataset sample.

This work was done as part of the paper Learning from Silence and Noise for Visual Sound Source Localization Models.

Paper citation:

@misc{juanola2025learningsilencenoisevisual,
title={Learning from Silence and Noise for Visual Sound Source Localization},
author={Xavier Juanola and Giovana Morais and Magdalena Fuentes and Gloria Haro},
year={2025},
eprint={2508.21761},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.21761},
}

Files

is3plus.zip

Files (3.8 GB)

Name	Size
is3plus.zip md5:ede4b9d01f100fb286c211fab2c52d43	3.8 GB	Preview Download

Additional details

Repository URL: https://github.com/xavijuanola/SSL_SaN
Programming language: Python

https://arxiv.org/abs/2508.21761

103

Views

Downloads

Show more details

	All versions	This version
Views	103	103
Downloads	89	89
Data volume	590.8 GB	590.8 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

The 36th British Machine Vision Conference (BMVC) , Sheffield, UK, 24-27 November

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more