Adaptive tuning using theremin as gestural controller

This work presents an interactive device to control an adaptive tuning and synthesis system. The gestural controller is based on the theremin concept in which only an antenna is used as a proximity sensor. This interactive process is guided by sensorial consonance curves and adaptive tuning related to psychoacoustical studies. We used an algorithm to calculate the dissonance values according to amplitudes and frequencies of a given sound spectrum. The theoretical background is presented followed by interactive composition strategies and sound results.


INTRODUCTION
The theremin is one of the earliest fully electronic musical instrument, it can also be considered the oldest electronic gestural controller and, for extension, a NIME device.It consists of two radio frequency oscillators and two metal loop antennas, the electric signals are amplified and sent to a loudspeaker.
Since it was invented by Russian Léon Theremin in 1917, there have still been several studies attempting to enhance the theremin's musical capabilities (as pointed in [1]).Many theremin diagrams and circuits for sale can be found on Internet sites such as www.thereminworld.com.
The main idea behind the use of the theremin is to provide gestural input through the antenna proximity sensor, enabling one to play sounds without touching anything.As it is still an unusual musical instrument, its intonation technique is particularly difficult.The sound generating circuits are still quite modestwith a limited set of timbres; this is because the instrument retains the original sonority of early electronic sounds.There are some MIDI theremins which control other synthesizers, but this is rather rare and more expensive.
Here we report a research which was first connected to the use of the theremin for improvisation and generation of sonic material to be used in electroacoustic compositions.Recently we started a new investigation where the idea behind it was a drastic simplification of the theremin concept, basically just the antenna proximity sensor connected to the computer, allowing it to play any sound from the computer as a direct controller, without any other electric circuit or MIDI devices.
In our system, the antenna gestural control is translated to pitch information and sent to an adaptive tuning patch that changes the theremin's tuning according to the sound's spectral distribution.All this interactive process is guided by sensorial consonance curves and adaptive tuning derived from the psychoacoustic studies of [2][3][4], and can be thought as a way of simplifying the hard intonation technique of the theremin.Finally, we present concepts behind the developed of an interactive composition, the system interactive performance and sound results.

THEORETICAL BACKGROUND
Plomp and Levelt [2] developed a model for psychoacoustic dissonance measurement (defined as sensorial dissonance) based on the beatings between partials.Their research results pointed out that the most dissonant value lies at the interval corresponding to about one fourth of the critical bandwidth.
Based on a formula that approximates Plomp and Levelt's results, Sethares [3] provided an algorithm to calculate the dissonance values of a specified register according to a list of amplitudes and frequencies of a given sound spectrum.The results are displayed as a curve (y dimension = sensorial dissonance, x = musical interval in cents), see figure 1.We have studied these two approaches in connection to Parncutt [4] and we described our research in [5 -6].
Studying Sethares' algorithm [3], we verified that his approximation differs according to the register.We then used a different algebraic approximation for the Plomp and Levelt [2] empiric results defined by Parncutt [4], who worked in a different roughness calculation model [7].Parncutt's formula provides a constant approximation because it deals with intervals in barks (the critical band scale), see [2].As Parncutt's formula needed intervals in barks, we used a mathematical formula presented in [8] to convert from Hertz to Barks.

Consonance Curves and Adaptive Tuning
In order to provide a beat less and consonant sound, Hermode Tuning provides a model to calculate and adjust musical intervals according to just intonation and it can be used in real-time to tune a synthesizer, see (http://www.hermode.de/).But this approach does not consider beat less intervals which do not correspond to just intonation when partials of sounds are inharmonic.
Sethares [3] also developed an adaptive tuning system in which adaptation occurs according to minimum values of sensorial dissonance curves at a specified time schedule or automatically.Based on his model, we developed an adaptive tuning patch in the Pure Data programming environment (see http://puredata.info/).Differently than Sethares, we calculate values to generate scales based on the following criteria: a) most consonant and b) most dissonant intervals according to a specified spectrum.Figure 1 shows a sensorial dissonance plot generated by our patch.A peak in the plot represents a high sensorial dissonance value, and a consonance is represented by a valley.

INTERACTIVE COMPOSION SYSTEM
Starting upon our previous experience on improvisation using the theremin and on the construction of multi-track layered material, our first idea was to replace a tape frozen structure by a real-time sound generation system.We implemented two processes: a) an interactive additive synthesis and pre-recorded samples to replace the tape material and b) the antenna gestural trajectory to control the adaptive tuning and re-ensemble the synthesis mechanism.

Synthesis & Pre-recorded Samples
Figure 2 presents the additive synthesis section of the Pd patch we developed.This patch includes 32 oscillators in parallel which can all be controlled via MIDI in order to pre-set partial content and insert an inharmonic index to control spectral compressing or stretching.The patch also produces synthesis with classical waveforms such as sinusoidal, square and triangle.Such simple sounds are used as a reference to early electronics sounds usually implemented on theremins.
Another feature is the use of pre-recorded samples in wavetable loops, which provides the possibility of playing a quite diverse set of sounds.We pre-processed all samples using our patch to calculate the dissonance curve of a given sample (and therefore deriving a scale) by performing a FFT analysis and extracting the list of amplitudes and frequencies of partials.
The synthesizer and wavetable loops are independent and preprocessed; they do not belong to the real time process.This preprocessed stage regards the construction of a data base of sounds and their corresponding scale derived by their sensorial dissonance curve.

Antenna & Adaptive Tuning
As already mentioned, the gestural controller of our system is the antenna (i.e. the simplified theremin).It provides input data for generating trajectories of pitch material.As the user improvises around the antenna and sustains a particular pitch, the adaptive patch can change the tuning automatically in real time, on a specified time in seconds, or even at a specified speed (interval in cents per second).The system's parameters for adaptive reaction are: a) the next step in the scale, b) the next valley (maximum consonance) step in the scale or c) next peak (maximum dissonance) step in the scale, see section 2.1 and Figure 4.

Composition and Improvisation
The first musical experiments with our system were performed with three antennas connected to the computer via the Arduino board (see http://www.arduino.cc/).The Arduino board is an open-source physical computing platform based on a simple I/O board that can take several inputs from switches and sensors.It can also be easily connected to puredata patches as a control input.
Each antenna controls one voice of the patch; two voices are connected each to an adaptive tuning module that adapts its tuning basing its interval relation to a third voice, see figure 4. The composition described in this section is an ongoing project Proceedings of the 2007 Conference on New Interfaces for Musical Expression (NIME07), New York, NY, USA and here we discuss musical strategies -a real time performance proposal and the gestural improvisation as a way of interacting with our system.We focused on the wavetable loops and did not use the additive synthesis modules yet for this creative process.
Figure 4 -Two adaptive tuning modules (one for each voice) processing the adaptation on a slow rate of 1 cent per second.This adaptive tuning section works for both the additive synthesis modules and the wavetable loops.

Sound Recording, Analysis & Processing
We tested our system using pre-recorded sample sounds from the harmonics of stringed instruments such as the violin and viola.As the spectrum content of this wavetables are quite simple, the combination of this material sort of resembles the idea of the additive synthesis paradigm.Thus we considered that by combining wavetables of harmonics we could create an interesting sound texture.This idea also relates to the Spectral Music premise that merges the concept of timbre and harmony [9], which we also presented in a paper [6].
Starting upon this spectral premise, we are not applying our system only to tune intervals and to create chords, but as a tool to generate complex textures: a dynamic sound motion flowing towards dissonant or consonant peaks and valleys, the scale steps as derived from the pre-processing of the wave sample's sensorial dissonance curve.

Real Time Performance
When the wavetable loops are modulated in frequency by the antennas, the performer decides whether there will be an adaptation, if it will be automatic or not, and if it will adapt to a consonant (valley) or dissonant (peak) interval.This interactive process defines the sound structure of the interactive sound material, and was used in the piece entitled "Walking Tune".
As showed in Figure 5, three antennas are distributed around the stage or room, which cannot be too large as each antenna is sensible to an area around a ratio of about two meters.Two or three string instrument players (violin and viola) interact with the system by moving around the space and playing according to the interactive sound material generated by the patch performer.The musicians play natural and artificial harmonics that blend with the interactive tape sounds interfering on and transforming the textural dynamics.

Gestural Improvisation
The improvisational concept is simple, the flow of sound texture varies from consonant to dissonant at different speeds, the patch performer determinates the rhythm of the process and the musicians respond to that.
By moving throughout the space, the musicians produce incoming data from their movement and positioning related to the antennas, this then generates a texture.When they rest at a particular point, the patch performer will adapt the texture by generating glissandos towards the next consonant or next dissonant step of the scale.The musicians then improvise their response producing dissonances (if the texture is at a dissonance configuration) or consonances (if the case is the opposite).At a desired moment, the musicians start moving again initiating a new cycle.The piece was written for a patch performer and two or three viola or violins players.The score is written as an interactive graphic in which the players have instructions on how to interact, see http://www.nics.unicamp.br/~porresfor more details on the score and sound examples.

DISCUSSION & CONCLUSION
The system provides a simplified and generic use of the theremin in which a musician plays any sound using the computer as a dynamic sound generator.It also retunes the theremin to desired scale steps, which can be used in real-time to facilitate the intonation technique.
We research on the sensorial dissonance model [2] to generate adaptive musical scales based on peaks and valleys (consonance and dissonance steps).We further intend to develop our system by including more accurate procedures which will involve loudness curves to weight the partials in the calculation of the curves based on the work of Clarence Barlow [10].Until now, the system has proven to be successful regarding its real-time performance and the sound results are satisfactory to the composition application we worked with.
Other applications of our system to composition and improvisation may be pertinent.We do have plans to work on a sound installation similar to the piece "Walking Tune" presented here.Our intention is to use several antennas distributed in a room open to the listeners' moving interaction.The antennas will not only control the pitch, but also different parameters of the sound antennas players patch performer and, again, control desired setups that can be used to create an adaptive environment.

Figure 1 -
Figure 1 -12.5 semitones sensorial dissonance curve of a waveform containing eight equally loud partials with an inharmonic index of +7 cents beginning at 500Hz (horizontal axis = semitones and vertical axis = sensorial dissonance).The vertical lines represent the derived scale intervals according to the peaks and valleys.

Figure 2 -
Figure2-This module generates additive synthesis waveforms and it also contains paths to graphs and other analysis data.Other features include some analysis graphics like the spectral distribution of partials, waveform, audio signal, and others.

Figure 3 -
Figure 3 -presents the diagram of the real time control of the Synthesis and Wavetables.

Figure 5 -
Figure 5 -Diagram of the real time performance of the "Walking Tune" using a patch performer and two viola players.