Correspondence between audio and visual deep models for musical instrument detection in video recordings

Olga Slizovskaia; Emilia Gómez; Haro, G.

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Correspondence between audio and visual deep models for musical instrument detection in video recordings

Title	Correspondence between audio and visual deep models for musical instrument detection in video recordings
Publication Type	Conference Paper
Year of Publication	2017
Conference Name	18th International Society for Music Information Retrieval Conference (ISMIR2017, LBD)
Authors	Slizovskaia, O. , Gómez E. , & Haro G. G.
Conference Start Date	23/10/2017
Conference Location	Suzhou, China
Abstract	This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate existing cross-correlations between neurons from the audio and video CNN which activate the same instrument category. We analyse two training schemes for multimodal applications and perform a comparative analysis and visualisation of model predictions.
preprint/postprint document	http://hdl.handle.net/10230/37216