Published June 1, 2013 | Version v1
Conference paper Open

Cross-modal Sound Mapping Using Deep Learning

Description

We present a method for automatic feature extraction and cross-modal mappingusing deep learning. Our system uses stacked autoencoders to learn a layeredfeature representation of the data. Feature vectors from two (or more)different domains are mapped to each other, effectively creating a cross-modalmapping. Our system can either run fully unsupervised, or it can use high-levellabeling to fine-tune the mapping according a user's needs. We show severalapplications for our method, mapping sound to or from images or gestures. Weevaluate system performance both in standalone inference tasks and incross-modal mappings.

Files

nime2013_111.pdf

Files (546.8 kB)

Name Size Download all
md5:d67fc5b70d2ce375f525da84c5a4a37a
546.8 kB Preview Download