UltraButton: A Minimalist Touchless Multimodal Haptic Button

We present UltraButton a minimalist touchless button including haptic, audio and visual feedback costing only $200. While current mid-air haptic devices can be too bulky and expensive (around $2 k) to be integrated into simple mid-air interfaces such as point and select, we show how a clever arrangement of 83 ultrasound transducers and a new modulation algorithm can produce compelling mid-air haptic feedback and parametric audio at a minimal cost. To validate our prototype, we compared its haptic output to a commercially-available mid-air haptic device through force balance measurements and user perceived strength ratings and found no significant differences. With the addition of 20 RGB LEDs, a proximity sensor and other off-the-shelf electronics, we then propose a complete solution for a simple multimodal touchless button interface. We tested this interface in a second experiment that investigated user gestures and their dependence on system parameters such as the haptic and visual activation times and heights above the device. Finally, we discuss new interactions and applications scenarios for UltraButtons.

Abstract-We present UltraButton a minimalist touchless button including haptic, audio and visual feedback costing only $200. While current mid-air haptic devices can be too bulky and expensive (around $2 k) to be integrated into simple mid-air interfaces such as point and select, we show how a clever arrangement of 83 ultrasound transducers and a new modulation algorithm can produce compelling mid-air haptic feedback and parametric audio at a minimal cost. To validate our prototype, we compared its haptic output to a commercially-available midair haptic device through force balance measurements and user perceived strength ratings and found no significant differences. With the addition of 20 RGB LEDs, a proximity sensor and other off-the-shelf electronics, we then propose a complete solution for a simple multimodal touchless button interface. We tested this interface in a second experiment that investigated user gestures and their dependence on system parameters such as the haptic and visual activation times and heights above the device. Finally, we discuss new interactions and applications scenarios for UltraButtons.

I. INTRODUCTION
T OUCHLESS interfaces such as mid-air buttons enable users to interact with systems without needing to physically touch a surface. Driven at first by science fiction movies such as Minority Report or Iron Man, the interest in touchless interfaces has increased in recent years when experimental studies showed that touchscreens in public spaces form a pathogen vector for bacterial and viral propagation [1], [2], [3]. This aspect has been exacerbated by the recent Covid-19 pandemic [4], [5].
Despite all this, touchless interfaces are still at their infancy, and their associated interaction paradigms remains limited. For instance, in a simple point and select task, touchless interfaces using a gesture tracking system as their main input modality need to differentiate between "pointing" and "selecting" actions. Thus the usability of touchless systems suffers from both a lack of gesture input standardisation and a lack of haptic feedback -the act of action confirmation to the user [6]. Touchless digital kiosks and large public displays circumvent this issue by relying on advanced visual and audio feedforward and feedback (e.g., visual animation) [7]. Simpler touchless systems may not include such large screens and high definition visuals and instead rely on very basic visual and auditory cues such as LED blinks and audio beeps.
In this paper, our aim is to enhance simple touchless interfaces with ultrasound mid-air haptic feedback [8] and parametric audio [9] capabilities. Mid-air haptic displays have been the focus of numerous studies (there are over 100 papers to date) -see a recent survey here [10]. Moreover, mid-air haptic displays are commercially available and can accurately deliver dynamic tactile feedback to users' palms and fingertips at a range of up to 1 m. This is usually achieved by focusing algorithms applied to phased arrays comprising hundreds of ultrasound transducers. Studies have shown that by providing midair haptic feedback to infotainment systems in cars [11], digital kiosks and pervasive displays [12], user performance and experience can be improved significantly. Notably, ultrasound phased arrays have recently been able to generate multimodal volumetric displays for visual, tactile and audio presentation using acoustic trapping techniques [13], [14]. While such devices can enhance touchless systems with rich haptic feedback, building them can be expensive due to the large number of transducers needed and the embedded micro-electronics used for manipulating individual phases and amplitudes. This high cost of the current generation of mid-air haptic displays may thus render them unsuitable for small and simple touchless interactive applications.
Despite there being much progress in the field of mid-air displays, the efforts to date have mostly been geared towards bigger and better [15]. Sometimes, however, less is more. Our goal here is to design and build a minimal-cost (in terms of dollars, power, and compute) mid-air haptic button with similar haptic strength as commercial alternatives yet remains practical and functional for simple touchless interaction scenarios. Our approach is guided by a simplification of the driving circuitry, a fixed-in-space mid-air haptic focal point, and a reduction in the number of transducers used, while still maintaining the ability to deliver a multi-modal output (visual, auditory and haptic feedback). To that end, we introduce a new and simple design for the generation of haptic buttons in mid-air -the UltraButton -the features and design of which we think can influence future touchless interfaces and market directions.
This paper describes the system and methods for creating an interactive mid-air button and its evaluations. The main contributions of this paper are: 1) A low-cost hardware design for creating a mid-air haptic UltraButton (see Fig. 1(a)). 2) A novel haptic algorithm for creating perceivable mid-air haptic sensations. 3) Multiple quantitative and qualitative evaluations of our multimodal mid-air haptic system. 4) An exploration of the use cases enabled by the UltraButton.

A. Ultrasonic Haptic Devices
Ultrasonic mid-air haptic devices are based on a nonlinear phenomenon called acoustic radiation pressure [16]. A high sound pressure level is generated by focusing acoustic waves emanating from multiple sources, while constructive interference at the focus is achieved through the electronic control of the amplitudes and phases of the ultrasonic transducers. Modulating the focus (or foci) in time and/or space and at the right frequency causes perceptible vibrations on the skin, which has since then been termed as mid-air haptics [8], [17]; a technology commercialised by Ultrahaptics (now Ultraleap) since 2014. Applications of mid-air haptics include automotive human machine interfaces [11], wireless power transfer [18], digital signage [12], augmented, virtual, and mixed reality (AR/VR/MR) [19], [20], [21]. A comprehensive review article was recently published on this topic [10]. Other modulation and sound field synthesis techniques can make use of similar hardware to generate levitating holographic displays [22] and parametric directional audio [9], [13].
The most commonly used hardware design of ultrasonic mid-air haptic technology is based on rectilinear arrays; a square grid of 200-300 ultrasonic transducers placed on a flat PCB. Larger or multiple array designs have also been constructed offering larger interaction regions [23]. Another approach to increasing interaction volume is to mount a standard-sized array on a robotic system that enables fast pan and tilt rotations [24], or indeed just mount it on the front of a VR headset [25]. Another hardware variant is that presented in [26] where a modified transducer layout was presented resembling a Fibonacci spiral arrangement, the effect of which is to reduce acoustic grating lobes (i.e., secondary unwanted focal points). All of these systems tend to suffer at varying degrees from a combination of drawbacks, including complex installation, large in size, complex electronic control, the need of a powerful host PC, high power requirements, and finally, high cost to build, assemble and deploy.

B. Virtual Buttons Using Haptic Feedback
Virtual buttons have been investigated in multiple scenarios with different tactile feedback technologies. Nashel and Razzaque [27] proposed a vibration propagation technique to inform the button's location, its functions, and its activation. When the path of the user's finger is in contact with the area of the virtual button, the screen sends a pulse to indicate it is on top of a button. A different sensation is sent if the finger stays for a long period inside the button region. Kim and Lee [28] investigated the relation between haptic feedback in virtual buttons based on the force graph of a physical button, and developed a method to provide feedback at multiple instances of the force graph.
Mid air haptic virtual buttons have been studied by R€ umelin et al., [29]. They investigated a single virtual button for a tap gesture interaction. They focused on short ultrasound stimuli and the variation of the frequency range. Marchal et al. [30] suggested adjusting the intensity of the button to emulate a change in its perceived stiffness. Another more sophisticated approach was developed by Ito et al. [31]. A mid-air dual-button was developed based on dividing the area of interaction in two layers. The top layer sends a sensation different than the bottom layer.
Other approaches include combining mid-air haptic displays with other technology. For instance, Ozkul et al. investigated complimenting mid-air haptic feedback with auditory stimuli for application to light switch button [32]. Finally, Freeman et al. suggested combining mid-air haptics with simple LED based visual feedforward, to guide hand movement during interaction (e.g., selection gesture) and then deliver haptic feedback [33].

III. ULTRABUTTON OVERVIEW
The UltraButton combines visual, tactile and sound features embedded in and generated by a single device while using a minimal number of transducers and electronic complexity.
A single fixed focal point (FP) is generated in space, approximately 10 cm from the device centre axis, using a novel concentric ring arrangement of transducers. Then, a novel low-cost algorithm is applied for adding modulation onto the FP such that it is able to generate parametric audio sounds and haptic feedback. Finally, a proximity sensor is used to identify user input such as a hand-tap gesture and an LED strip is used to provide visual feedback and feedforward. All this is encapsulated in a single PCB plus a microcontroller logic board (the dimensions of the device are 150 mm in length and 230 mm in width) as shown in Fig. 1. The transducers' arrangement is contained inside a circular area of 120 mm diameter. Due to our minimalist approach, our prototype bill of materials (BOM) cost remains below $200 which is one order of magnitude lower than the current mid-air haptic display commercially available.

A. Ultrasound Transducer Arrangement
At the most basic level, to produce a focused ultrasonic field, one needs simply to drive a set of ultrasound transducers in such a way that every element contributes constructively at a specific point in space. Most ultrasound-based mid-air haptic displays rely on a collection of individually controlled ultrasound transducers. This allows for the flexibility to adjust the driving phase of each element so as to make the output constructive at any desired location but comes at the cost of complex and expensive driving electronics. To alleviate these problems, one can constrain the haptic point position and design a simpler ultrasound array accordingly. Instead of adjusting the driving phase electronically, we assume a single drive signal and adjust the location of the transducers to achieve the desired constructive interference. The simplest way to achieve this is to assemble a concave-array where the array represents a section of a sphere of radius z and all the transducers on its surface are pointing inward. With such an arrangement, the transducers are all equidistant to the sphere centre and therefore interfere constructively at the focus location.
While such a concave-array can easily be produced using 3D printing and manually placing and connecting the transducers to the driving electronics [34], it remains impractical to integrate in other systems or to mass-produce.
Keeping the idea of a fixed haptic point, we suggest the use of a flat PCB with transducers arranged along concentric rings (see Fig. 2(B)) such that a high pressure focus is formed above the centre of the rings (see Fig. 2(A)). This transducer arrangement carries many simplifying benefits. First, since the distance to the desired central FP from each ring is the same, any one ring will naturally add constructively at the focus location. Second, it is possible to choose the ring radii in such a way that a common driving signal can be used for all rings.
The radius of each additional ring can be calculated by incrementing the distance from the focus to each ring by one ultrasound wavelength. Thus, additional rings at the correct incremental radii will add acoustic pressure to the FP. We note that the acoustic pressure contribution to the FP from a transducer in an outer ring is less than that from a more centrally located transducer due to the distance attenuation of the wave. However, outer rings will have more transducers and may therefore contribute more pressure to the FP in aggregate. The desired FP height z can be adjusted up or down by changing the radii of the rings. Transducer packing density on the PCB can be further increased by inverting the phase of every other ring by manually alternating the transducer polarity, thus effectively applying a p phase shift and allowing the distance of concentric rings to the FP to be separated by multiples of half a wavelength while still using the same driving signal. We thus separate transducers into two groups, each with a reversed polarity, such that alternating rings belong to the same group.
This can be understood geometrically in the diagram of Fig. 2(A), whereby the radius of the nth concentric ring is defined by the inner most ring of transducers r 0 and satisfies is the distance from the intended FP height z and the nth ring radius r n . Rearranging the above equation for r n we arrive at an expression for the appropriate radius which result in a single focus at z To decide on how many transducer rings to physically include in the design of the UltraButton, one needs to be able to calculate the pressure produced at the focus and ensure that it is high enough, e.g., 155 dB SPL. To do so, one can start by calculating the complex pressure P t ðp z Þ at a point p z due to a piston source emitter [35] at point p t using where P ref is a constant that is defined by transducer amplitude, dðx; yÞ is the Euclidean distance between points x and y, the transducer directivity function is defined by 2ÁJ 1 ðka sin u zt Þ ka sin u zt , where J 1 is the Bessel function of the first kind, k ¼ 2p= is the wave-number, a is the transducer radius, u zt is the polar angle between points p z and p t , and f t is the initial phase of the transducer here set to 0 or p depending on the parity of n. Finally, to calculate the total pressure P ðp z Þ generated by the ring layout design (or any layout in fact) at the focus at p z , one must compute the summation of the contribution of each transducer t 2 ½1; T and take its absolute value P ðp z Þ ¼ P T t¼1 P t ðp z Þ .
To generate the acoustic fields and calculate P T ðp z Þ we chose to use properties from the muRata MA40S4S transducer specifications sheet as these transducers can reliably produce a large amount of sound pressure (20 Pascals at a distance of 30 cm), operate at f c ¼ 40 kHz ( ¼ 8:575 mm), have a halfpower beam-width of 60 , and a radius of a ¼ 5 mm. Finally, the transducer array design placement needs to also consider the physical radius of the transducers since this affects the number of transducers that can be packed in each ring, but also where other electronic components will be placed on the PCB such as a proximity sensor for detecting user input and LEDs for visual feedback. Using this approach, we found that the layout obtained in Fig. 2 can produce a peak acoustic pressure of 2000 Pa, and averages to 152.75 dB SPL using Amplitude Modulation (AM), and 154 dB using 2 Frequency Modulation (2FM) defined in Section IV.

B. Time of Flight Optical Sensor
To detect the presence and distance of the user's hand in front of the UltraButton device, we use the VL53L0X time-offlight (ToF) proximity sensor by STMicroelectronics. The sensor contains a 940 nm laser source which is invisible and rated eye-safe, and a matching sensor that can measure the absolute range from 30 mm to 1 m in its default mode of operation. For optimal tracking, we placed the sensor at the middle of the device, i.e., at the centre of the concentric rings and thus right under the mid-air haptic focus. The distance hand-device is computed by the microcontroller as the Euclidean distance between the device centre and the output of the VL53L0X sensor plus a small offset to account for the sensor height.

C. LED Strip
To provide visual feedback before, during, or after user interactions with the UltraButton device, we have included a multi-colour LED strip soldered onto the PCB at the space between the first and second ring of transducers. This allows to provide the UltraButton users with additional visual information as discussed further down in Section VII.

D. Microcontroller
To control the operations of the UltraButton, a driver board has been assembled composed of a Teensy 3.2 microcontroller that generates two digital periodic signals with the phase defined by the two groups of transducers. The amplifier driving the transducers is fixed at 20 V and another 5 V power supply is used to power the microcontroller, the proximity sensor and the LEDs. The micro-controller board does not need to be connected to a computer for sending phases to the array elements. This feature makes the device easy to use and integrate. The microcontroller makes use of 1 GPIO or 2 GPIOs to drive the transducer using the Amplitude Modulation or 2-Frequency Modulation, respectively. An additional 2 GPIOs are used to communicate with the proximity sensor and 1 GPIO is used to control the LED strip. Therefore, out of the 23 GPIOs available on Teensy 3.2, up to 18 of them are unused. The extra GPIOs can be used to connect to additional peripherals, including communication peripherals such as Bluetooth dongle. This last possibility is explored further in the application section VII.

IV. MODULATION TECHNIQUES
In this section, we describe two algorithms producing a haptically perceivable FP at a short distance above the device, namely, Amplitude Modulation (AM) and Two Frequency Modulation (2FM). We then describe how to modulate an audio signal to produce directional audio, and discuss audible noise artefacts and health & safety considerations associated with the UltraButton.

A. Amplitude Modulation
Amplitude Modulation (AM) is the most commonly used technique for mid-air tactile display and for generating parametric audio [9]. It modulates the ultrasound pressure intensity between 0 and 1 at a given periodic frequency while keeping the FP position fixed in space. In 3(A), one can observe the simplicity of this technique and how a phase shift is applied to the carrier frequency at the different groups of transducers.
The AM driving technique is based on the superposition of two waves, the carrier signal which is a high frequency signal of, e.g., f c ¼ 40 kHz in our case, and the modulating signal which is around, e.g., f m ¼ 200 Hz for mid-air haptics, and may vary for parametric audio. The equations characterising the AM technique are thus: where A m 2 ½0; 1 and A c are the amplitudes of the modulating carrier signals, respectively. The Root-Mean-Square of the amplitude modulated signal Y AM is equal to ffiffiffi ffi 3 16 q AcAm 4 % 0:43 AcAm 4 .

B. Two Frequency Modulation
Using two frequency modulation (2FM) is an alternative and novel method that can generate a modulated FP that is haptically perceivable to the skin receptors. The 2FM technique is based on the sum of two waves with nearby but different carrier frequencies f 1 ¼ f c þ df and f 2 ¼ f c À df. When these two carriers interfere, a "beat frequency" effect develops and produces the frequency f beat ¼ jf 1 À f 2 j ¼ 2df (see Fig. 3(B)). By setting the beat frequency at the same value as the modulation frequency in the AM technique (i.e., f m ¼ 2df), we modulate the Since the two groups are reverse polarised (shown in yellow and purple), a p phase shift is naturally applied to the carrier frequency to produce a focus. The focus is then modulated by an envelope frequency (e.g., 200 Hz for haptics). (b) 2FM algorithm: Slightly different signals are sent to each transducer group.
FP amplitude in a similar way than with the AM technique, which will "feel" the same to the user (see section V-B). We note that beat frequencies have been extensively studied and used in a number of wave applications, however, this is the first time they are used for mid-air haptics. The equations characterising the 2FM technique are thus: where f 1 is the signal frequency of the first group and f 2 is the signal frequency of the second group, which for UltraButton is placed on different rings on the PCB as described previously. The Root-Mean-Square of the amplitude modulated signal Y 2FM is equal to 1 2 A c 2 ¼ A c 4 . Therefore, to obtain the equivalent AM frequency of 200 Hz at the FP, one should choose f 1 ¼ 40100 and f 2 ¼ 39900 when using 40 kHz resonant transducers like the MA40S4S. Note that both these frequencies are close enough to the resonant frequency (less than 1% variation) therefore minimising any loss in output and are compatible with the transducer ring arrangement. After submission of this paper for review, Mizutani et al. [36] suggested driving multiple arrays at different frequencies to produce a haptic sensation. We remark that UltraButton leverages multiple frequency modulation to produce a haptic sensation at the circuit level of the system (see Fig. 3(b)).
Finally, we note that the 2FM scheme drives each transducer at full power resulting in maximal utilization of each transducer's output, unlike the AM scheme which has an effective 50% duty cycle (see Fig. 5(C)). However, as each transducer is at full-blast, a continuous and prolonged mid-air haptic FP might result in self-heating of the transducers. This should be less of a problem at low duty-cycles, e.g., for a midair button-like tap where a short burst of high intensity pressure is generated.

C. Haptic Feedback
The acoustic radiation force produced by a FP of 155 dB SPL produces around 1mm of skin indentation [37]. For the FP to result in a tactile perceptible vibrational effect, a modulated signal between 5 Hz and 1000 Hz is necessary, however further restricting this range to 50-300 Hz is more likely to be felt [8], [38], with lower/higher frequencies corresponding to rougher/smoother tactile sensations [39]. As discussed above, the UltraButton can generate sufficient acoustic radiation force and a perceptible tactile modulation at the FP using either the AM or the 2FM scheme. The acoustic field generated by the device is shown in Fig. 4. The circular symmetry of the transducer layout manifests itself as a signature in the acoustic field (see right picture in Fig. 4), while the high acoustic pressures that surround the FP are an unwanted and unavoidable side-effects of the UltraButton transducer layout, however, are below our tactile perception threshold.
The pressure field however is not enough to explain if a focus is perceivable by a human hand. To see this, one needs to also simulate the temporal variation of pressures due to Y AM and Y 2FM along with their Fourier spectrum as shown in 5. Note that the Fourier spectrum of the two modulation schemes are quite different with 2FM having a more efficient energy distribution. Despite this, their resulting acoustic fields and the temporal pressure variations are indeed very similar. Although a formal user study is yet to be conducted, the two modulation schemes feel very similar, if not identical. In section V we will show that both algorithms can be perceived as equally strong for all the test forces.

D. Audible Sounds and Noise
Parametric audio is the well-known phenomenon whereby audible sound is produced from ultrasound through nonlinear mixing in the air [40], [41], [42]. Westerveldt shows that, to Fig. 4. Simulated acoustic field pressure with a focus at z ¼ 10 cm. On the left is a cross-section from the device side, while on the right is a cross-section along the z ¼ 10 cm plane. first order, the mixing sound generated by two coincident sound waves is proportional to the product of their pressures and the square of their difference frequency [40]. This is a volumetric effect whereby the larger the volume of air with different frequencies traveling co-linearly in it, the more mixing sound will be produced. Together, this yields the directedaudio effect from ultrasonic end-fire arrays modulated with an audio signal [42]. In that case, a large area of transducers is all producing the same AM content, producing a multi-frequency wavefront which mixes as it propagates. Since the end-fire array is typically large compared to the wavelength, the ultrasound remains collimated for long distances.
The UltraButton has enough acoustic pressure to generate parametric audio which starts to occur at approximately 135 dB PSP. By modulating the transducers with an audio signal (either with amplitude modulation or a more sophisticated single sideband technique), it can act as a small speaker. Because the array is configured to focus, rather than create a collimated beam like an end-fire array, it will not have the same beam-like properties but can still produce a noticeable amount of audio localised above the device as if emanating from a point source. The AM audible signal overlaid onto the ultrasound carrier can produce a variety of sounds, beeps, clicks, voices and even music, however, the quality tends to deteriorate and distort for low-pitch sounds.
More important than its ability to create audible sound is the system's ability to prevent audible sound while generating mid-air haptics. Rapid changes to the acoustic field can cause unwanted audible noise [43]. This can be understood as a product of the increased efficiency of nonlinear mixing at higher frequencies and rapid changes that inexorably include higher modulation frequencies. Since the UltraButton only consists of a single driving signal, optimizing that signal to be as smooth as possible comes at a lower cost than for a similar effort in an individually-driven phased array. This can be done by increasing the accuracy (bit-depth) of a PWM driving signal or using an analog system. For the prototype presented here, a simple M4 microcontroller is already able to generate a PWM signal with 10-bits of resolution.
The 2FM scheme produces even further reduction of unwanted audible noise by reducing the volume of space where multiple frequencies are co-linear and able to mix. In the 2FM scheme, any one transducer is only producing a single frequency of ultrasound and therefore, alone, is not producing any parametric audio. Only as the wavefronts arrive at the focus is there any possibility of nonlinear mixing. Even then, this volume is limited in size as the waves quickly converge, focus, and then diverge. The net result is that the 2FM scheme produces noticeably less audio noise (usually heard as a small buzz) when compared to the AM scheme while producing nearly identical haptic feel.

E. Safety in Mid-Air Haptic Feedback
When designing mid-air haptics one also needs to consider safety guidelines and best practices relating to high intensity ultrasound and potential hearing damage. High intensity ultrasonic arrays of transducers working at 40 kHz have been studied in several papers [44], [45] to examine the acoustic energy exposure levels experienced by a user during interaction with a mid-air haptic FP. These studies note that the pressure away from the location of the FP drops rapidly, typically by 20+ dB by the time it reaches the user's head. Furthermore they show that exposure to up to 120 dB SPL at the ear, over a period of 5 to 10 minutes induces no change in hearing sensitivity. Additionally, international guideline provided by the ACGIH and adopted by the U.S. Occupational Safety and Health Administration (OSHA) recommends a maximum limit of 145 dB at the ear. UltraButton produces up to 154 dB SPL at the FP, but this will drop to 134 dB SPL and more by the time it reaches the user's ear. Furthermore, UltraButton's ultrasound transducers are only activated for a short amount of time (150 ms click burst) and the proximity sensor controls when the device is on. Hence, we can affirm that UltraButton is safe for the user's hearing.

V. EVALUATION OF HAPTIC FEEDBACK
The UltraButton relies on the premise that the novel transducer spatial arrangement generates comparable acoustic pressure at the focal point (FP) as other ultrasound mid-air haptic devices. Hence, the force applied to the user's skin should be comparable, inducing haptic stimuli of equivalent perceptual strength. To test this premise, we have evaluated the haptic feedback of UltraButton against that of a commercially available ultrasound mid-air haptic device, namely a Stratos Explore from Ultraleap Ltd. First, we registered the force generated by the FP generated by the UltraButton and the Stratos Explore development kit across a range of intensities input using a precision scale microbalance. Then, we ran a quantitative user study in which participants rated the perceived strength of the FP produced by either devices at various force levels.

A. Focal Point Generated Force
In this experiment, we measured the force generated at the FP by UltraButton and Stratos Explore development kit consisting of 256 transducers (16x16 rectilinear phased array) using a precision scale (KERN PCB 2500-2). To isolate the FP acoustic pressure from the ambient acoustic pressure, we positioned a foam board with a circular hole of $20 mm diameter a few centimetres above the precision scale. The foam board was fixed and suspended (non grounded) just over the balance scale thus blocking any acoustic force, except that of the FP. Further, we placed a small cylindrical pillar of 20 mm diameter on top of the precision scale, with its top surface aligned with the foam board. The ultrasound devices were positioned upside-down (transducers facing down) 10 cm above the foam board and were aligned with the pillar so that the FP centre matched the pillar surface centre. The obtained setup is represented in Fig. 6(a).
Then, we measured the force generated by each device, for intensity inputs ranging from 0.1 to 1 by step of 0.1. The Stratos Explore device generated an AM point at 200 Hz, while the UltraButton generated a 2FM point at 200 Hz. Each measurement was repeated five times and averaged before being reported in Fig. 6(b). The results show that both devices generate comparable forces up to intensity 0.8. This was expected as the higher number of transducers in the Stratos Explore enables the creation of focal points at a much higher acoustic pressure.

B. User Study
Based on previous works [46], [47], the forces showed in Fig. 6(b) are above the tactile perception threshold for ultrasound mid-air haptics when $0.04 gf. However, to be sure that participants could perceive the haptic stimuli from the two devices, in our user study we chose to use forces values well above that threshold but lower than the point where the two curves in Fig. 6(b) diverge. We therefore restricted the study to forces ranging from 0.08 gf to 0.12 gf, with a step of 0.01 gf. In our studies, we compared the perceived strength of 2FM haptics using UltraButton, and AM haptics using Stratos Explore. However, since the 2FM technique has a slightly different envelope compared to the traditional AM technique (as discussed in Section IV-C) potentially affecting the tactile perception of the generated haptics, we used results from Fig. 6(b) to adjust the output intensities so that an equivalent force is produced between the two devices during the comparison. Specifically, for UltraButton we used the forced measured on the precision balance as they were already matching the range of the chosen forces, whilst for the Stratos Explore, we fitted the data obtained from the precision balance measurements to a quadratic model (R 2 =. 98) and predicted the intensity values needed to produce the test forces. Finally, we ran a magnitude estimation task comparing the perceptual performance of the two ultrasound devices.
1) Participants: A total of 23 participants took part in this study (age m ¼ 31:6, s = AE4:6). They had normal or glasses/ lens corrected vision and no history of neurological or psychological disorders. Upon arrival, participants were asked to read the information sheet and sign a consent form before the task was explained to them. Further, all the procedural steps were indicated on the experiment GUI.
2) Procedure: The procedure is summarised in Fig. 7. Participants sat in front of the setup illustrated in Fig. 7(A) with their left hand facing downwards on a dedicated hole (gap).
Beneath it, the two devices, UltraButton and Stratos Explore were positioned on a moving platform that was hidden from the participants. Participants were also required to wear headphones playing white noise to isolate devices and environment noises. Hence, participants could not see nor hear the mid-air haptic devices or the moving plate while operating. We followed a magnitude estimation task procedure in which we presented pairs of stimuli composed of a fixed reference and a comparison stimulus. The reference was rendered by the Stratos Explore and was set at 0.1 gf which corresponds to the middle value for the range of test forces chosen for this experiment -0.08 to 0.12 gf. The comparison stimulus contained each time, one of the five forces to rate for UltraButton and the Stratos Explore, and was presented in a randomised order. In total, we tested five forces for each of the two ultrasonic devices corresponding to 0.08, 0.09, 0.1, 0.11, and 0.12 gf. Prior to the experimental phase, participants were informed that the reference stimulus had a fixed arbitrary value of 100. After the reference stimulus, a second stimulus (comparison) was delivered; participants were requested to rate the comparison stimulus in contrast with the reference one. Therefore, if the comparison stimulus was felt as twice stronger, a value of 200 was inserted. If it was perceived as half stronger, a value of 50 was inserted, etc. Before delivering each of the haptic stimuli for one second (i.e., reference and comparison), participants could hear a 500 ms "beep" sound from their earphones to focus their attention. We employed a within-participant design with three repeated measurements for each force for a total of 5 (forces) Â 2 (devices) Â 3 (repetitions) = 30 stimuli.
3) Results: Fig. 8 shows a box plot for the ratings of the five forces tested, colour-coded and grouped by the two devices. A Shapiro-Wilk test indicated that our data was likely to significantly deviate from a normal distribution (p < 0.001). Then, we carried out multiple Wilcoxon tests to explore the differences in the strength of the tested forces between Ultra-Button and the Stratos Explore device. Each level of the variable force is summarized in Table I. For UltraButton force levels, there were five different force combinations that were differently perceived by the participants. Moreover, for the Stratos Explore, there is a non-significant difference between forces 0.11 and 0.12 gf (p ¼ 0:061).
All the comparisons appeared to be statistically not significant (p > 0.05). In other words, participants perceived the Fig. 6. a) Setup used to measure the FP force of the two ultrasonic mid-air haptic devices. b) Plot of the force measured as a function of the FP intensity for the two ultrasonic devices. Fig. 7. a) Experimental setup. Participants placed their left hand onto the gap. A linear actuator was positioned on one of the two ultrasound devices under participant's palm. The setup was hidden by a black cloth. b) Experimental procedure used for the user study. Participants could feel a first reference stimulus, then there was a second stimulus that they had to rate in comparison to the reference.
stimuli of the two devices as equally strong for all the tested forces. Further, to explore if participants were able to feel a change between the different force levels within the same device, we ran two Friedman tests, one for UltraButton and one for the Stratos Explore. Both tests confirmed a statistical difference of the perceived strength between the different force levels (p < 0.001). The data shows that UltraButton ratings have a significantly higher variance than Explore Stratos. This perception can be caused by differences in the FP rendered of these devices and then influence the user's perceived force.

VI. EVALUATION OF ULTRABUTTON AS A SYSTEM
We performed a second experiment to investigate the functionality of the UltraButton as an interactive system composed of an array of transducers, a ToF sensor, and an LED strip. For our evaluation, we selected 12 mid-air buttons with varying height threshold (four heights -from 10 to 150 mm) and haptic burst duration (three values -from 50 to 300 ms). In all cases, the LEDs were flashing for 100 ms. Beyond usability, we aimed at understanding user preferences across these two calibration parameters. We chose a limited set of values to avoid the participants to get used to the task and repeat automatically the same push action for all the buttons. We chose easily differentiable feedback activation onset heights, from near to far the FP, with click-like haptics duration (50 ms), a duration equal to the flashing LEDs (100 ms), and a longer one (300 ms). Further optimization is possible however is beyond the scope of this paper. 1) Participants: Ten participants were recruited (age m ¼ 31:7, s = AE5:37). Upon arrival, they were asked to read and sign a consent form before the experiment task was explained to them.
2) Setup and Procedure: A laptop and the Ultrabutton were placed on a desk in a quiet room along with a chair for participants to sit during the study. No headphones were used, as the FP sound was not audible (see IV-D). All participants were right-handed, by chance, so the device was placed on the laptop's right side. The laptop screen displayed the task instructions and a trial counter from 1 to 12 for each block. The user could have a short break in between blocks. Participants were instructed to press the mid-air button located just above the Ultrabutton just as if they were approaching a physical button and to freely move their right hand above the Ultrabutton system as they thought best. The ToF would register their action and would then provide haptic and visual feedback (no audio). When they thought they successfully pushed the mid-air button, they were instructed to press the keyboard space-bar to proceed to the next trial. The laptop played a 'beep' sound at the beginning of each trial, after which the participant could start performing the push action. Following the study, the researcher performed a semi-structured interview to investigate the participants' experience with the system. The whole procedure lasted approximately 15 min per participant. A simple interaction diagram is shown in Fig. 10.
3) Study Parameters: The participants tested 12 different realizations of the UltraButton. In all cases, the haptic FP location was fixed at 100 mm (the algorithm sends the same phase delay per concentric ring, which will arrive at 100 mm at the centre of the device creating a 200 Hz modulation) and would activate as soon as the ToF sensor detects the user's hand crossing the feedback onset height threshold. Each UltraButton realization had a different haptic feedback duration (50, 100, and 300 ms) and a different feedback activation onset height (10, 60, 100, and 150 mm above the FP location). All these combinations were tested in random order and repeated three times, giving 36 trials per participant.

4) Results:
We analysed participants' pushing and realising behaviour by focusing on the minimum distance reached by their hand while pushing the buttons and looking at the time spent completing the interaction. We grouped the participants' behaviour by the four different feedback activation onset heights tested. The ToF times-series data were pre-processed to filter any sensor anomaly and then fitted to a parabolic curve. Finally, data were averaged over the 10 participants for each of the four feedback activation onset heights. Note that the raw data were already very close to a parabola. In Fig. 9, we show the resulting three parabolas for each haptic duration time and each of the four different feedback activation onset heights.
At first visual inspection of Fig. 9, we observe that the haptic duration time did not have a significant influence on the minimum distance reached by the participants' hand when pushing the mid-air buttons, since all curves in each sub-figure reach a similar lowest point. In addition, there is only a small  proportional trend between haptics duration and task time. To test that, we performed an ANOVA repeated measures within each group which did not highlight any significant differences, neither for the minimum distance reached by participants' hand, nor for the time to complete the task (p > 0:05). Further, we considered differences between the feedback activation onset height and the 12 test variants of the UltraButton. Regarding the minimum distance from the device reached by participants' hand, we observed significant statistical differences for both the minimum distance (x 2 = 19.320, p < 0:001) and the task time (x 2 = 8.040, p ¼ 0:04). We have found differences between the minimum distance reached by participants' hand and the buttons whose feedback activation onset height was set to 10 vs 100, 10 vs 150, and 60 vs 150 mm, with the smaller feedback activation height leading to hand minimum distance from the device. The only significant difference time-wise was between the button whose feedback was activated at 10 vs 100 mm. Overall, we can observe how participants, despite the feedback being activated at different heights, tended to continue the hand movement until being near the FP location at 100 mm, even if the LEDs had already turned off by that point. We note that while the FP centre is at z ¼ 100 mm, the high intensity acoustic field of the FP stretches up to 130-150 mm as seen in the simulations of Fig. 4. Indeed, the haptics was perceivable at that range but felt stronger closer to the FP centre. Thus, we argue that the haptics played a more significant role in the participant's hand motion than the LEDs.
Finally, we would like to summarise the most relevant points extracted from the interviews with the participants. Nine participants reported preferring the button whose feedback activation onset height and haptics were at 10 cm from the device. This confirms and explains the behaviour we observed in the previous paragraph (i.e., the participants prefer to feel stronger haptics and be at a more natural distance from the system). Eight participants reported preferring longer haptic sensations. That, "makes the sensation more perceivable, and it provides a higher degree of confidence in understanding that the action was successful". All the participants mentioned they relied equally on the LEDs and the haptics, even if five of them reported that when they could not feel the haptics, they felt the action was "weird" as if they did not complete the task successfully. All the participants thought they would use the mid-air haptic button in a real scenario, if available, mainly motivated by hygienic reasons. Some participants commented they prefer a more refined design or dev kit rather than a research prototype. We also noticed an interesting effect where three participants mentioned that they perceived the LEDs duration as varying with the haptics duration, indicating a prevailing effect of haptics on visual time perception.

VII. INTERACTIONS AND APPLICATIONS
The UltraButton is a minimalist touchless button device that supports a plethora of multimodal interactions through its input and output sensors and microcontroller connectivity. Namely, the present device detects simple gesture input such as a tap and double-tap using the onboard proximity sensor. Visual, audible, and haptic feedback can be pre-programmed and flashed onto the microcontroller and threshold or variability triggered by such user gesture inputs, or can be timedelayed accordingly. The proximity sensor can also use the estimated hand-to-device distance to provide feedforward information (e.g., to guide, prime, or inform the interaction) Fig. 9. Participants' behaviour when pressing the 12 buttons grouped by the four LEDs activation heights tested. The red dashed line represents the LED activation height. Overall, it appears the participants' hand reached for the haptic sensation fixed at 100 mm. Fig. 10. Experimental setup. We designed 12 buttons which combined four feedback activation onset heights (Feedback À Onset height 2 10 mm, 60 mm, 100 mm, 150 mm) from the distance of the FP and three haptic sensation times (haptics time 2 50 ms, 100 ms, 300 ms). In the experiment, participants had to perform a push action in mid-air. The haptic sensation was always at the same height (100 mm) from the centre of the device.
using one or many of the available modalities, which can be multiplexed in time to create a sequence of interactive experiences. Note that audio and haptics cannot be triggered simultaneously. An example of a touchless multimodal button tap interaction is shown in Fig. 11.
Each of the three modalities available to the UltraButton (visual, audio, and haptics) has a rich and easy to understand design space. The LEDs can change colour (Red, Green, Blue), adjust their brightness, and can turn on and off independently. Audible sounds (beeps, clicks, voice, and music) can be generated using parametric audio modulation techniquesthe sound quality deteriorates for low-pitch sounds. Finally, the fixed in space mid-air haptic FP can vary its intensity or blink on/off at different rates to emulate a button click's temporal force profile (usually lasting about 100 ms) or indicate some notification of functionality. The possible combinations are therefore many, providing a wide design space for user experience designers to tailor to the applications at hand.
The UltraButton can find applications in various settings. This is facilitated by its small footprint ($ 100 cm 2 ), its extensive microcontroller input/output connectivity, its low cost ($ $100 À 200 depending on bulk order), and its low power requirements ($ 25 Watts). The UltraButton can be batterypowered for mobile applications, connected to the internet through a WiFi or Bluetooth dongle, or can be chained to many UltraButton devices to form an UltraPanel. With public touch surfaces such as touchscreens, elevator panels, ATMs, and pedestrian call buttons under scrutiny for being pathogen spreading hubs [3], [48], [49], UltraButton offers a compelling alternative solution.
Multiple UltraButton devices can be assembled and designed to be integrated into control panels, for example, an elevator panel as in Fig. 1(C). The interaction design of such interfaces must be carefully thought of, designed, and tested. As a proof-of-concept for the elevator example, one could consider using just two UltraButtons for the up and down call buttons, with easily recognisable visuals and sounds to assist in the interaction. Different colours can be used for the up and down buttons; they could change before and after a tap interaction and indicate the current floor or the desired direction of travel (e.g., down). Simple beep or click sounds can be generated just after the interaction while haptic feedback can be presented during the interaction on the user's palm or fingertip. A demo prototype of an accessible elevator using commercial mid-air haptic devices was proposed by [50]. Similar setups can be assembled for light switches, push-to-exit doors, water fountains, sanitary paper, liquid soap dispensers, and other simple interfaces in public spaces.
Various fun game applications can also be thought up and created with UltraButton, before being deployed in locationbased entertainment (LBE) venues. For instance, a touchless Whac-A-Mole game could be created using multiple Ultra-Buttons arranged in a grid and made to light up at random, to be tapped/whacked in mid-air; as we discovered in VI changing the activation of LED at different times will change the perception of the users and miss the target making the game more enjoyable. Such a solution would support widespread public usage without worrying about cross-user contamination and spreading disease.
Finally, the multimodal feedforward and feedback capabilities afforded by the UltraButton can guide and help keep a user's hand steady at a set mid-air location and pose while image authentication algorithms run in the background [51].

VIII. CONCLUSION
We have presented UltraButton, a minimalist touchless multimodal haptic button. Our prototype implementation (see Fig. 1(A)) utilises 83 ultrasound transducers and produces perceivable mid-air haptic feedback and sound source at 10 cm above the device. UltraButton also provides visual feedback through 20 LEDs soldered onto a single PCB alongside the ultrasound transducers and a proximity sensor. The whole system is controlled via a microcontroller and makes use of low complexity commodity electronics resulting in a total bill of materials (BOM) that costs under $200, unlike full-blown mid-air haptic and multimodal displays which utilise phased ultrasound arrays that can cost a lot more to manufacture. Its core enabling feature is its ability to deliver simple mid-air haptic sensations in addition to audible feedback such as a button "click" at short distances from the device. The user can trigger them via basic gesture inputs detected by the onboard proximity sensor. To that end, we have described a simple but novel ultrasound modulation driver signal (2FM) capable of inducing mid-air tactile sensations and one audio modulation technique for generating directional sound playback.
To evaluate UltraButton, we ran two formal experiments comparing the haptic feedback (i.e., the acoustic radiation force of a focal point at 10 cm above the device surface) generated by UltraButton and a commercially available mid-air haptic display (i.e., Stratos Explore from Ultraleap Ltd.). First, we used a precision scale to measure the acoustic radiation pressure generated at the FP and revealed that UltraButton can generate forces well above the perception's threshold and comparable with the Stratos Explore device. Secondly, we designed a user study exploiting a magnitude estimation task procedure to evaluate the perceived strength of the mid-air haptic feedback generated with our novel 2FM algorithm using UltraButton against the feedback generated with the more traditional AM algorithm using the Stratos Explore device. The study showed that at equal force outputs, there were no statistically significant differences between the perceived haptic effect of the two algorithms and devices, and therefore both algorithms produce haptic feedback that is perceived with equal strength. Finally, a third user study was designed to evaluate the whole system by creating 12 different mid-air buttons. This set of buttons varied the activation of LEDs at different heights and the duration of the haptic sensation. We found that visuo-haptic feedback influenced the hand trajectory during button press gestures. The post-study interview revealed a preference for mid-air haptic and LED activation height to be congruent when the activation height is closest to the FP location.
UltraButton offers a low-cost, low-footprint, yet versatile solution for enabling haptic feedback on touchless interfaces. The multimodality of the UltraButton along with its connectivity, feedforward, feedback, and multiplexing capability options presents HCI and UX designers with a rich but simple tool to understand and experiment with to create novel touchless interfaces and applications. In our paper, we discussed some ideas such as an elevator panel (see Fig. 1(C)), games, and hygienic public interfaces. We hope that this work can inspire and guide future studies, applications, integrations, and implementations of touchless multimodal interfaces. Despite this, it should be noted that many simplifying tradeoffs had to be made to reach UltraButton, such as the versatility and range afforded by phased array solutions that can generate multiple FPs at multiple locations in 3D space.
Finally, we would like to stress that each design step of our approach (i.e., layout and driving signal) has been described thoroughly in this paper and uses solely off-the-shelf electronics, hence facilitating the reproduction and adaptation of UltraButton-like devices by the community. Therefore, we hope that our studies will pave the way to a whole new ecosystem of UltraButton-like devices and their integration into many multimodal mid-air haptic interfaces.