Correspondence between audio and visual deep models for musical instrument detection in video recordings

Slizovskaia, Olga; Gómez, Emilia; Haro, Gloria

doi:10.5281/zenodo.1067688

Published October 27, 2017 | Version v1

Conference paper Open

Correspondence between audio and visual deep models for musical instrument detection in video recordings

1. Universitat Pompeu Fabra

This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate exist- ing cross-correlations between neurons from the audio and video CNN which activate the same instrument category. We analyse two training schemes for multimodal applications and perform a comparative analysis and visualisation of model predictions.

This work is supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).

Files

ISMIR2017SlizovskaiaLBD.pdf

Files (305.5 kB)

Name	Size	Download all
ISMIR2017SlizovskaiaLBD.pdf md5:38d31a689ed0a33010c1ba7ab841dbbe	305.5 kB	Preview Download

206

Views

128

Downloads

Show more details

	All versions	This version
Views	206	206
Downloads	128	128
Data volume	40.0 MB	40.0 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

18th International Society for Music Information Retrieval Conference (ISMIR) , Suzhou, China, 23-27 October 2017 (Session LBD)

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 28, 2017
Modified: August 2, 2024

Correspondence between audio and visual deep models for musical instrument detection in video recordings

Authors/Creators

Description

Files

ISMIR2017SlizovskaiaLBD.pdf

Files (305.5 kB)