A hybrid method for extended percussive gesture

This paper describes a hybrid method to allow drummers to expressively utilize electronics. Commercial electronic drum hardware is made more expressive by replacing the sample playback "drum brain" with a physical modeling algorithm implemented in Max/MSP. Timbre recognition techniques identify striking implement and location as symbolic data that can be used to modify the parameters of the physical model.

: A diagram of the structure of the system used for the drum software. must be passively waiting for any implement to hit it, rather than actively tracking the stick with a sensor. A typical physical model of a drum consists of a virtual drumhead, virtual stick, and a virtual resonator. The virtual drumhead and virtual stick could be replaced with real objects and the resonator would be simulated in software. The gesture is directly translated into sound through audio processing, rather than indirect capture of a gesture via sensors.
Timbre recognition is used to determine the timbre of the drum when it is struck and infer the corresponding gesture. The timbre information is then used to modify the parameters of the physical model (see figure 1). The drum is both an instrument and a metacontroller; It can modify its own controls based on the user's input. This eliminates the need to press a button in order to change presets.
Although there have been many other drum controllers, the work of Roberto Aimi [1] is the most related. Aimi used the direct signal processing approach for a realistic interface but opted to use convolution. While many of his results sound like acoustic instruments, the ability to create a more varied sound palette is somewhat limited as well as the ability to seamlessly morph between sounds.

PHYSICAL MODELING
Physical modeling allows for a variable synthesis method that provides a low latency and minimal complexity. By using digital waveguide techniques [4], the software was simple to build and is very computationally efficient. The algorithm employed is a one stage digital waveguide.
The users are able to control the length of the tube (length of the delay), the dampening of the tube (feedback coefficient), and the timbre (coefficients of the filter). Presets may be stored and recalled by the user. The user may freely change the character of the sound using Max/MSP's filtergraph object. While the user may not be physically informed while making these changes, often physical sounds will arise. Currently, there are presets that sound similar to bells, congas, cardboard boxes, and other presets that are somewhere in between.

TIMBRE RECOGNTION
When designing the type of control needed for this project, the following list of gestures and their groupings was generated: The flam stroke is produced when the user uses two sticks that hit nearly simultaneously. The radial position refers to the point on the drumhead that the drum was struck. Since there is only a single sensor in the middle, it is impossible to correlate with another source find the position in two dimensions. The implement and magnitude categories are self-explanitory.
Marsyas [6] is used to analyze the audio stream and segment it into discrete hits. This way asynchronous events can be passed when the drum is hit. Once a hit is detected, a fixed window of 4096 samples are captured to be processed. This window is sent to the feature extraction section of the software where various digital signal processing algorithms are computed. The resulting feature vector is delivered to multiple classifiers. The output of the classifiers is collected and sent via Open Sound Control.
The features are collected into a feature vector to be classified by a guassian classifier provided in Marsyas. A gaussian classifier models the distribution of features or vectors of a particular class as a single Gaussian distribution. This distribution is characterized by the mean and covariance matrix of the training vector estimated from the training set. Gaussian classifiers are very easy to train and are fast in classification but are not particularly accurate when compared to the performance of classifiers. The classification rates drastically decrease if the distribution of the feature vector is not gaussian [2].

RESULTS
After some empirical tests, the timbre recognition accuracy was not adequate. This was confirmed when the result-ing feature matrices were collected into an .arff file for Weka [7] and the actual accuracy was given (see table 1). These results show the current effectiveness of the drum and provide a goal for improvement. The mangitude metric will not be shown since that is achieved through direct signal processing. For the classification tests the two classes for Implement are stick and brush and the two classes for the stroke type are normal and rim. The large analysis window is used because a previous study by the author demonstrated that larger windows increase the accuracy of the classification [5]. This latency is tolerable because the drum is always computing sound with the current audio input. The perceived effect is that triggering a patch change event will change commence on the next strike, rather than the current one.

CONCLUSIONS
The use of virtual resonators in conjunction with a physical device provides an intuitive interface for drummers. Timbre recogintion techniques provide a suitable way of translating audio into symbolic information that can be used by the user to control the drum's controls. Direct translation of gesture through audio provides a very satisfying experience for users because of the obvious connection between gesture and result.