Published September 5, 2025 | Version 0.1
Dataset Open

IS3+

  • 1. ROR icon New York University
  • 1. ROR icon New York University
  • 2. ROR icon Pompeu Fabra University

Description

IS3+ is an extended version of IS3 with clean audio/image pairs to ensure cross-modality consistency. The dataset has 4 GB of data.

The dataset contains the following data:

  • audio_wav: audio files (.wav)
  • gt_segmentation:  annotations of image bounding boxes and segmentation masks
  • images: images (.jpg)
  • IS3_annotation.json: file with image/audio/gt information for every dataset sample.

 

This work was done as part of the paper Learning from Silence and Noise for Visual Sound Source Localization Models.

Paper citation:

@misc{juanola2025learningsilencenoisevisual,
      title={Learning from Silence and Noise for Visual Sound Source Localization}, 
      author={Xavier Juanola and Giovana Morais and Magdalena Fuentes and Gloria Haro},
      year={2025},
      eprint={2508.21761},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.21761}, 
}

Files

is3plus.zip

Files (3.8 GB)

Name Size Download all
md5:ede4b9d01f100fb286c211fab2c52d43
3.8 GB Preview Download

Additional details

Software

Repository URL
https://github.com/xavijuanola/SSL_SaN
Programming language
Python