A system-level view of optimizing high-channel-count wireless biosignal telemetry

In this paper we perform a system-level analysis of a wireless biosignal telemetry system. We perform an analysis of each major system component (e.g., analog front end, analog-to-digital converter, digital signal processor, and wireless link), in which we consider physical, algorithmic, and design limitations. Since there are a wide range applications for wireless biosignal telemetry systems, each with their own unique set of requirements for key parameters (e.g., channel count, power dissipation, noise level, number of bits, etc.), our analysis is equally broad. The net result is a set of plots, in which the power dissipation for each component and as the system as a whole, are plotted as a function of the number of channels for different architectural strategies. These results are also compared to existing implementations of complete wireless biosignal telemetry systems.


I. INTRODUCTION
There is a growing need for many-channel yet low-power wireless biosignal telemetry systems. Initially such systems have been developed for and are presently used with large animal models (e.g., non-human primates). However, recently more researchers need such systems for much smaller animals models (e.g., rodents) and with longer operational lifetimes (days). As a result, there is an even greater demand for high-channel-count (>100) solutions that operate at a very small power level (<5 mW). The sections below describe our system-level analysis and optimization of each component and the system as a whole.

A. Specifications
The system described in this paper is shown in Fig. 1. The data is sensed at the electrode, and passed through a number of gain-controlled amplifiers. The output is muxed to a timeinterleaved analog-to-digital converter (ADC). Optimized digital circuitry then performs spike detection, sorting, and clustering, or pass the data stream directly to the transmitter. Optionally, spike detection may be performed by analog circuitry [1]. Before wireless transmission, the data bit-stream is packetized, with the insertion of control data and training sequences. The bit-stream is then modulated, encoded, and upconverted to radio frequency before being passed to the power amplifier and transmitted from the antenna.

III. OPTIMIZATION
In the following sub-sections we describe our optimization analysis of each major block in a wireless biosignal telemetry system.

A. Analog Preamplifiers
The analog preamplifiers are needed to condition the sensed biosignals before digitization. This typically entails a voltage gain of 10 3 to 10 4 , which is needed to match the full scale of the ADC, and bandpass filtering from approximately 1 Hz to 6 kHz [2], which is needed to reject out-of-band noise and provide anti-aliasing for the ADC. The exact values of the gain and filtering bands is, of course, application dependent. If needed, the gain and filtering can be distributed across several stages, where each stage consists of a capacitively coupled amplifier (Fig. 2). The total inputreferred noise of the entire amplifier is dominated by the first stage, as its gain suppresses the noise of subsequent stages.
The level of input-referred noise that is acceptable for a given application is the the most important specification to set when optimizing an analog amplifier for minimum power and area. Optimizing the noise of the amplifier is strongly influenced by optimizing the noise of the first stage, and in particular, the noise of the input transistors. A number of amplifier-optimization studies have been performed (e.g. by Harrison [3], Chae [4], Kim [5], and Wattanapanitch [6]). Furthermore, instead of directly quantizing the signal amplitude with an ADC, other approaches are possible. Examples are delta modulation [7], pulse-width modulation [8], and wavelet transforms [9]. This paper focuses on the ADC-based approach.
A typical front-end-amplifier design is shown in Fig. 2, which comprises of an operational amplifier (OA), capacitors Effective noise at input due to v n,amp C 1 and C 2 to block the dc offset of the electrolyte-electrode interface and fix the gain at C 1 /C 2 , and resistor R B to set the dc-operating point and low-frequency cutoff. Parasitic capacitance C in is due to the input capacitance of the OA.
The main source of noise in the amplifier, which originates from MOSFET thermal and flicker noise, is represented as a voltage source v n,amp . In the analysis presented here, thermal noise is considered, but flicker noise has been ignored. The noise v n,amp is inversely proportional to the square root of bias current I D . However, increasing the bias current also requires a larger device, which leads to increased parasitic input capacitance C in . When the value of C in approaches C 1 , the amplifier loading causes the effective noise at the input terminal V in to increase. Simply increasing the value of C 1 , to reduce the relative loading of C in and reduce amplifier noise, is unattractive because this increases the silicon area. For small bias currents, the parasitic loading is insignificant, but the overall noise is high due to v n,amp . We expect to find an optimum design that balances these effects, which is explained in the following paragraphs.
Mathematically, the amplifier noise v n,amp from the circuit devices (i.e., MOSFETs, resistors) is given by where β is a proportionality constant that relates noise v n,amp to bias current I D , V T is the thermal voltage (26.8 mV at 37 • C), BW is the amplifier bandwidth, and κ is the subthreshold parameter. We assume the input device is in weak inversion [10], [11].
The following equation quantifies the effect of parasitic loading v 2 n,in = where C in is the input capacitance of the amplifier, which is equal to α · I D , and α is a proportionality constant that relates the input capacitance to the bias current. Using Eqs. 1 and 2, we obtain the minimum noise v n,in as a function of bias current I D and capacitor sizes (which is similar to Chae's approach [4]). The minimum achievable v n,in for a given C 1 is where γ is the MOSFET noise coefficient, L is the length of the channel, q is the electron charge, µ is the charge mobility in the channel, IC is the inversion coefficient, and K AMP is set by amplifier architecture. Low input capacitance and good transconductance efficiency (g m /I D ) is obtained by setting IC around 0.1. Using Eqs. 1 and 2 (plotted in Fig. 3), we estimate that we require a minimum current of approximately 1.4 µA for an input referred noise of 2 µV and value of >14 pF for C 1 . We also see from Fig. 3 that a noise level of 2 µV is achievable with a minimum capacitance of 1.4 pF and at 5 µA bias current. We find a tradeoff of 3.6 times (i.e., 5 µA/1.4 µA) variation in current yields a corresponding 10 times (i.e., 14 pF/1.4 pF) variation in capacitor area. This example illustrates the tradeoff between area and power, and how it impacts high-channel-count systems. Equation 3 shows how C 1 sets the noise, and hence the supply current via Eqs. 1 and 2. Since C 2 is much smaller than C 1 for gains greater than 10, C 2 can be adjusted to set the appropriate gain without significantly changing the power dissipation of the amplifier.

B. Analog-to-Digital Converters
Low-power ADCs are a critical part of many applications, and as such, several examples exist in the literature that are suitable for use in a biosignal data-acquisition systems. The required ADC specifications vary widely according to application, but typically range from 8 to 12 bits, with a sampling rate of 1 to 30 kHz. To estimate the power, recent literature was surveyed, and is shown in Fig. 4. The ADC area can also be estimated from the survey (not shown). Successive approximation ADCs are an attractive architecture, because of their high-power efficiency, moderate speed and medium resolution, match the needs of biosignal acquisition.
The performance of ADCs can be normalized according to a figure of merit (F oM ) given by where P is the power dissipation, B is the number of bits, and f S is the sampling rate. A low F oM indicates little energy is expended for each conversion of a sample. Lines of constant F oM are also  Fig. 4. Several ADCs achieve close to 100 fJ per conversion-step, with two close to 10 fJ/conv-step. We conservatively use 100 fJ/conv-step as our benchmark for estimating ADC power.

C. Digital Signal Processing
There are two aspects worthy of discussion regarding digital signal processing. First, how much signal processing should be performed before transmitting data off-chip? Second, how is the chosen system implemented efficiently?
To address the question of on-chip processing, several algorithms were chosen as candidates. These were raw data streaming, adaptive differential pulse code modulation (ADPCM) [12], [13], spike detection, feature extraction, and clustering [14]. All of these options, apart from raw streaming, are lossy to varying degrees, which means information is lost from the original recorded data, either through lossy compression or discarding waveform details. Table I shows the power per channel for different options, with their corresponding reduction in data rate of processed signals per channel. Immense reductions in data rate can be achieved, which in turn eases the load on the transmitter. The amount of data compression is dependent on the spike firing rate.  The three main tasks of spike processing are (1) detection and alignment, (2) feature extraction, and (3) clustering. In conventional circuits, only detection is performed on chip, with feature extraction and clustering delegated to off-chip hardware. However, feature extraction and clustering greatly reduces the amount of data required to represent a spike, and hence can reduce the power required for the transmitter. This Power estimates obtained from Synopsys for NEO, Spike Output mode. The total power P total is divided into switching power P switching and leakage power P leakage . leads to lower overall system power, and allows for a large number of channels to be recorded.
Of the algorithms investigated in [14], the following algorithms were shown to be an efficient trade-off between hardware complexity and performance. These were (1) nonlinear energy operator (NEO) for detection, (2) maximum derivative for alignment, and (3) discrete derivatives for feature extraction.
To implement these algorithms, a Matlab/Simulink-based graphical design environment, with the Synplify DSP blockset, was used. This Simulink model provided a bit-true, cycle-accurate representation of the design. This design process also avoids design re-entry since both the hardware design and test vectors are auto-generated from the tool.
Another key advantage, due to the automation, is swift evaluation of different architectural trade-offs.
Furthermore, to reduce the power and area of the design, three circuit techniques were used. Supply-voltage scaling was used to minimize the transition energy of the logic, with an optimum value around 0.55 V in a 1-V technology. Next, parallel data streams were sequentially fed to a single hardware block running at higher speed ( Fig. 5 and 6). This reduced the silicon area, and also leads to less leakage power for the design. As the design is register dominated, 16× interleaving only resulted in a 35% decrease in area. Lastly, an automated-wordlength-reduction tool was used to optimize the wordlengths throughout the design.
The combined result of these techniques leads to an estimated 2 µW per channel for a feature-extraction digital signal processor.

D. Wireless Transmitter
When designing a wireless telemetry link, one must consider the physical limitations of the transmitter, the channel (i.e., the medium between the transmitter and the receiver), and the receiver. A full wireless-link budget, which calculates the required transmitter power level, starts at the output of the transmitter and ends with demodulated data from the receiver. An expression for the required transmitted power level as a function of the critical physical limitations involved, is given by The first group of terms represents the noise generated by a 50-ohm resistor (R S ) in a matched RF system, which takes into account the impedance at the antenna. The second term is the noise figure N F , which is the ratio of the total noise at the output of the receiver, to the noise contribution due to a 50-ohm resistor passed through the receiver. The third term is the SN R required for decoding the digital data with a bit-error rate of less than 10 −6 . Although this error rate may seem high, conventional coding strategies can be used to reduce the error rate to a level required for a given application [16]. The last term of the numerator is the bandwidth of the communication channel. The terms of the denominator involve critical components of the communication channel. The path loss P L, represents the reduction of transmitted power as a function of distance from the transmitter. Rayleigh-fading margin RF M takes into account the changes in received power due to the constructive or destructive overlap of signals arriving from multiple paths (i.e., multipath interference). The transmitter antenna gain G TX takes into account the impact of the antenna design on its ability to efficiently transmit power to the channel. Similarly, the receiver antenna gain G RX takes into account the impact of the design of the receiver antenna on its ability to receive power from the channel. This term will also include gain achieved through multiple-input-multipleout (MIMO) strategies when used, although we expect that in this application there will be only a single input (i.e., SIMO) [16].
Ultimately, the multipath issue imposes a limit on the maximum data rate that can be achieved for a given communication channel. A transmitted signal may take multiple paths to the receiver. As a result, there is a spread in arrival times of a given transmitted signal at the receiver. The symbol length is the name given to the duration of time used to transit a unique representation of a bit pattern. The symbol time must be significantly greater than the spread in arrival times. Typically, a factor of 10 is considered to be acceptable. The delay spread (i.e. the rms value of arrival times at the receiver) of a typical room is approximately 20 ns [17], [18]. Given the 10× design rule-of-thumb, the symbol length must be at least 200 ns (i.e., the symbol rate must be less than 5 · 10 6 symbols/s). By encoding two bits into each symbol, the maximum data rate is 10 Mbps, at the cost of transmitter complexity.
Numerical values for each component of the link budget are given in Table II (which shows Eq. 5 in log form) and are either directly calculated or taken from literature. The result of all of this analysis, is that the minimum power delivered by the transmitter to the channel must be at least 12.6 µW (-19 dBm). Of course, this value is dependent on the selection of the modulation scheme, the design of the individual components, and the specific needs of the application. Implementing an efficient transmitter to deliver the required output power is still an active research topic. Scaling from previous work [19] indicates that less than 3 mW is feasible.

E. Total System Power
Now that we have power estimates for the main blocks the wireless telemetry system (Fig. 1), we can choose the optimal system configuration. For this analysis, we use neural spikes as the signal of interest. Several different system configurations are explored (Fig. 7), including 3 DSP methods, in the following section.

IV. RESULTS
The five configurations are (1) raw data, (2) analog detection, (3) digital detection, (4) feature extraction and (5) clustering. In raw-data mode, each channel is digitized and directly transmitted without any additional processing. This mode results in the highest transmitter data rate. Analogdetection mode is used to gate the ADC, which reduces the power as the ADC is only on a fraction of the time, and only spike data only is transmitted. Digital-detection mode, which uses the digitized samples to detect a spike, requires that the ADC operates continuously. Although a digital detector can be implemented with lower power than an analog implementation, again only spike data is transmitted. Feature-extraction mode entails calculating waveform characteristics of the detected spike and transmitting only  these spike features. Finally, clustering mode determines a best match between a detected spike and a library of spikes (user-configured or trainable), and transmits a short spike ID.
For a system with raw streaming (no on-chip DSP, as shown in Fig. 7a), a maximum of 52 channels are implementable within a 10 Mbps limit, which is short of our 100channel target. The power for each block is shown in Figs. 8 and 9 as a function of the number of channels. A burstmode transmitter is assumed, in which the transmitter and frequency synthesizer are turned on every T latency seconds, transmits a packet of data from the recording channels, and then shuts down again. After enabling the synthesizer, it takes T start seconds for its output to stabilize. Hence there is a finite amount of wasted energy due to the synthesizer startup [20]. This can be reduced by using a faster-start-up synthesizer or a longer buffer (leading to longer latency). For low number of channels (bottom axis) or data rate (top axis), the synthesizer is able to be switched off after the packet is sent. At higher number of channels (or data rate), the synthesizer remains on as its start-up time is greater than the time before the next packet. The analog front-end (preamplifiers and ADC) run continuously, and the power amplifier (PA) is on only when the packet is being transmitted. The total power for 52 channels is 5.8 mW, or 110 µW per channel. As shown in Fig. 8, for less than 2.5 Mbps, the synthesizer power, due to its slow start-up, is significant compared to the PA power (and hence transmitted signal power).
Introducing DSP to detect a spike and only transmitting spike data (Fig. 7b) reduces the amount of transmit data. Compared to the raw system (Fig. 7a), the DSP power required is negligible, the data rate has been reduced to 20%, and the total power reduced to 54%. With this, approaching 100 channels is now possible using a 3.2-Mbps link. In this analysis a 100 Hz spike detection rate was assumed. Lower rates would provide even greater power savings.
We see that digital processing can be implemented with low power, which greatly reduces the amount of data to be transmitted. This relaxes the requirements of the transmitter, or allows a higher number of channels to be implemented.  perspective.

V. CONCLUSION
The optimization of several key blocks of a wireless telemetry system have been described. The optimization of amplifier noise and area, as a function of capacitor size and bias current, has been described. State-of-the-art ADCs have been reviewed, and shown to have sufficiently low enough power dissipation to be compatible with a lowpower biosignal telemetry system. Digital signal processing, at both the algorithm and circuit level have been discussed, along with strategies for minimizing power (i.e. selecting an NEO algorithm to maintain reliable spike detection, voltage scaling, and pipelining with time-interleaving).
We have shown a system design for wireless telemetry that enables a large number of channels. Previously published work is shown in Fig. 10. Analog implementations [21], [22] are suitable for low channel counts, and have higher power compared to the other systems. Spike-detection systems [23], [24] show high channel counts and/or lower power per channel. Finally, spike sorting [25] demonstrates the potential (and even necessity) for increasing local-digital-signal processing to facilitate a low-power system. The estimates for our proposed system show the feasibility of a high-channelcount system (>400 with feature extraction and clustering), while maintaining low power (<10 mW).