Convolutional Networks for Visual Onset Detection in the Context of Bowed String Instrument Performances
Authors/Creators
- 1. Institute for Language and Speech Processing (ILSP), Athena R.C., Athens, Greece - School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
- 2. Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
- 3. Institute for Language and Speech Processing (ILSP), Athena R.C., Athens, Greece
- 4. School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Description
In this work, we employ deep learning methods for visual onset detection. We focus on live music performances involving bowed string instruments. In this context, we take as a source of meaningful information the sequence of movements of the performers’ body and especially the bowing motion of the (right) hand. Body skeletons for each video frame are extracted through OpenPose and are then used as input for Temporal Convolutional Neural Networks (TCNs). TCNs prove capable of handling such temporal information by conditioning outputs on an adequately long history (i.e. variable receptive field), ensuring highly parallelizable lightweight computations and a multitude of trainable parameters that provide robustness. As another source of information for our task, we consider the more subtle movements of the (left) hand fingers which are responsible for pitch changes. Detections in this case rely directly on pixel data from specifically chosen regions of interest. Here, a 2D Convolutional Neural Network (CNN) is applied on the input in order to learn the features to be fed to the TCN. The models were trained and evaluated on single-player string recordings from the University of Rochester Multi-Modal Music Performance (URMP) Dataset. We show that these two approaches provide some complementary information.
Files
SMC_2021_paper_63.pdf
Files
(1.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:30720c5aa8725f4e30f36f4b667f76c6
|
1.4 MB | Preview Download |