Preview

AVS-Synthetic Dataset

The paper Annotation-free Audio-Visual Segmentation with the dataset is accepted by WACV2024. The project page is https://jinxiang-liu.github.io/anno-free-AVS/.
We release the codes at https://github.com/jinxiang-liu/anno-free-AVS.
Due to some technical reasons, there some missing audio clips (for training) in the orginal audio.zip file. If you download the dataset before August 22th, please re-download the audios.zip to replace the original one; Otherwise, just ignore this message and download the dataset.
If you have any problems, feel free to contact jinxliu#sjtu.edu.cn (replace # with @).

Note, the dataset corresponds to the arxiv paper https://arxiv.org/abs/2305.11019v3 .
The images and masks folders provide the image-mask pairs from LVIS and OpenImages.
The audios folder contains the 3-second long audio clips from the VGGSound, please using the center 1-second sub-clip for training and evaluating. And the pickle file category_for_vggsound_audios.pkl describes the labels of the audios. The labels are in according with the cls_id column in the annotations.csv file for model training.
The annotations.csv file provides the annotations for each training, validation and testing samples. For the training samples, we do not sepcify the audios. In pratice, just randomly sample the vggsound audios with the cls_id in each epoch to compose the (image, masl, audio) triplet. For validation and test sets, we designate the audio sample from VGGSound for each image-mask sample.