Thesis Open Access
Jordà, Sergi; Haki, Behzad
This dissertation is a summary of the research on the task of the dualization of rhythm patterns. Rhythm pattern dualization is a transformation of a multi-instrumental rhythm pattern to another pattern composed of maximum two instruments while maintaining coherence and the perceptual essence of the original rhythm. It is a novel task, so comprehensive literature research marrying many disciplines is conducted first. The problem is approached in a multidisciplinary way. Drawing from neurology, cognitive science, and psychology, we assemble solid foundations for tackling the task. We propose two machine learning models built upon the, recently reported by Google Magenta, GrooVAE model for rhythm humanization . The GrooVAE network topology is a combination of Sequence To Sequence Learning and Variational Autoencoder architectures. We treat the task of dualization as a variation of the dimensionality reduction problem. Thus, we intend to achieve the dualized version of rhythm by modifying the network’s architecture in a way, that creates a reduced intermediary representation in the process of autoencoding. We propose two models achieving rhythm compression in different ways. In the first, Autoencoders model, we first reduce the dimensionality of the original GrooVAE network, next we collect hidden state vector values from the first layer of the decoding network. Then, we train a cluster of autoencoders to find a latent, two-dimensional representation of these h-vectors, which we treat as a dualized version of the input pattern. In the second, Bottleneck model, we create a two-dimensional bottleneck layer in between the two original layers of the decoder network. We treat this two-dimensional bottleneck representation as a dualized version of the input pattern. Finally, we evaluate our models with listening experiments and report the results.