The paper Annotation-free Audio-Visual Segmentation
with the dataset is accepted by WACV2024. The project page is https://jinxiang-liu.github.io/anno-free-AVS/.
We release the codes at https://github.com/jinxiang-liu/anno-free-AVS.
Due to some technical reasons, there some missing audio clips (for training) in the orginal audio.zip
file. If you download the dataset before August 22th, please re-download the audios.zip
to replace the original one; Otherwise, just ignore this message and download the dataset.
If you have any problems, feel free to contact jinxliu#sjtu.edu.cn
(replace #
with @
).
Note, the dataset corresponds to the arxiv paper https://arxiv.org/abs/2305.11019v3 .
The images
and masks
folders provide the image-mask pairs from LVIS and OpenImages.
The audios
folder contains the 3-second long audio clips from the VGGSound, please using the center 1-second sub-clip for training and evaluating. And the pickle file category_for_vggsound_audios.pkl
describes the labels of the audios. The labels are in according with the cls_id
column in the annotations.csv
file for model training.
The annotations.csv
file provides the annotations for each training, validation and testing samples. For the training samples, we do not sepcify the audios. In pratice, just randomly sample the vggsound audios with the cls_id
in each epoch to compose the (image, masl, audio) triplet. For validation and test sets, we designate the audio sample from VGGSound for each image-mask sample.