Published June 29, 2021 | Version v1
Conference paper Open

Convolutional Networks for Visual Onset Detection in the Context of Bowed String Instrument Performances

  • 1. Institute for Language and Speech Processing (ILSP), Athena R.C., Athens, Greece - School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
  • 2. Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
  • 3. Institute for Language and Speech Processing (ILSP), Athena R.C., Athens, Greece
  • 4. School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece

Description

In this work, we employ deep learning methods for visual onset detection. We focus on live music performances involving bowed string instruments. In this context, we take as a source of meaningful information the sequence of movements of the performers’ body and especially the bowing motion of the (right) hand. Body skeletons for each video frame are extracted through OpenPose and are then used as input for Temporal Convolutional Neural Networks (TCNs). TCNs prove capable of handling such temporal information by conditioning outputs on an adequately long history (i.e. variable receptive field), ensuring highly parallelizable lightweight computations and a multitude of trainable parameters that provide robustness. As another source of information for our task, we consider the more subtle movements of the (left) hand fingers which are responsible for pitch changes. Detections in this case rely directly on pixel data from specifically chosen regions of interest. Here, a 2D Convolutional Neural Network (CNN) is applied on the input in order to learn the features to be fed to the TCN. The models were trained and evaluated on single-player string recordings from the University of Rochester Multi-Modal Music Performance (URMP) Dataset. We show that these two approaches provide some complementary information.

Files

SMC_2021_paper_63.pdf

Files (1.4 MB)

Name Size Download all
md5:30720c5aa8725f4e30f36f4b667f76c6
1.4 MB Preview Download