Function approximation with spiked random networks

This paper examines the function approximation properties of the "random neural-network model" or GNN. The output of the GNN can be computed from the firing probabilities of selected neurons. We consider a feedforward Bipolar GNN (BGNN) model which has both "positive and negative neurons" in the output layer, and prove that the BGNN is a universal function approximator. Specifically, for any f is an element of C([0, 1]s) and any epsilon>0, we show that there exists a feedforward BGNN which approximates f uniformly with error less than epsilon. We also show that after some appropriate clamping operation on its output, the feedforward GNN is also a universal function approximator.


Introduction
The theory of function approximation by neural networks is a necessary underpinning to many applications such as pattern recognition, data compression, time series prediction, adaptive control by neural networks, etc..In applications, the main design objective is often to nd a network which is a good approximator to some desired input-output mapping.However, in addition to the conventional notion of approximation, neural networks are valued especially for their ability to generalize, i.e. to use information they have learned in order to synthesize similar but non-identical input-output mappings under novel circumstances.The desired mapping can be presented to the network via a set of examples, as is often the case in supervised learning, or by a time series (when neural nets are used for prediction), or even by the observation of an This author's work was supported by the O ce of Naval Research under grant number N00014-97-1-0112.
unknown dynamical system.Many models of neural networks have been mathematically demonstrated to be universal approximators, and related results include proofs for the conventional multilayer perception (MLP) 4], the radial basis function (RBF) neural network 10], the fuzzy neural network 18], the wavelet neural network 19], and the rational function neural network 14].Approximation theory 2] deals with the approximation of a function f(X) of an input vector X by some other function F(w; x) having a xed number of parameters denoted by the vector w.The parameters w are chosen so as to achieve the best possible approximation of the function f.For instance one may choose w so as to minimize some norm or error criterion jjf(X) ?F(w; X)jj.Since neural networks are used in so many applications, it is desirable that a neural network be a universal approximator in some useful sense, e.g. it should at least approximate any continuous function on a compact set to an arbitrary degree of accuracy.Obviously, one can consider that the neural network's inputs are described by X, while the parameters w are the network's \weights".On the other hand, it is well known 3] that the accurate approximation of a function class can lead to poor \generalization" capability, and generalization is the major attractive property of neural networks.Intuitively one may imagine that both approximation and generalization are desirable properties under di erent circumstances.This paper is devoted to the approximating capabilities of spiked networks, which has not received much attention in the literature.Spiking models (see for instance 15,16]) are of great importance because they represent more closely the manner in which signals are transmitted in real (biological) neural networks where they often are voltage (action potential) spikes rather than analog levels, although we need to recognize that no single mathematical model is known can capture all aspects of neuronal signaling.Historically, spiking behavior in neurons has been described by the well known Hodgkin-Huxley equations 1].However the stochastic nature of the spiking behavior has been recognized by many authors and discussed in various publications (e.g.15,16]).
The neural network model we discuss in this paper 5, 6, 7, 8, 12] (\Gelenbe's neural network" or GNN) is a novel biophysically inspired spiked model which di ers substantially from existing deterministic models (e.g. the MLP) and random models (e.g. the Boltzmann machine or the models described above).It is based on a direct point process representation, with a discrete state-space internal state representation of each interconnected neuron 5,6,7,8].The rich mathematical structure of the network in terms of an in nite set of Chapman-Kolmogorov equations leads to compact closed form solution for the feedfowrward and recurrent case 5,12] that the model has a closed form solution for network state even in the recurrent case.This in turn yields e cient numerical algorithms.Typically, a spiked stochastic model will include some internal representation of each neuron's state, and a probabilistic representation of successive ring times as a function of state (see for instance 16], p. 65); additionally, rules need to be given about the manner in which the internal state changes when excitatory and inhibitory spikes are received, and about how the internal state changes after ring.In our model, contrary to other models which are available, the internal state is a non-negative integer; rises or falls depending on the excitatory or inhibitory nature of incoming sipkes, and it drops each time the neuron res.Inter-ring intervals are exponentially distributed (as in some of the models discussed in 16]).In our model the representation of a recurrent network results in a system of coupled Chapman-Kolmogorov equations 24] for a continuous time Markov chain with a countably in nite number of states.
Recent work on a variety of signi cant applications of this model 9,12,17,21,20,22,23] has shown that the GNN is able to e ciently carry out desirable functions such as optimization, learning, associative memory, pattern recognition, etc..It has been successfully used in some classical NP-hard combinatorial optimization problems such as the minimum vertex covering problem 11, 17], the traveling salesman problem 13], and the Steiner problem in networks 23], where its main advantage is that it can be easily solved numerically, without the use of lengthy Monte Carlo simulations which are needed for addressing similar problems with the Hop eld network or the Boltzmann machine.
The GNN learning algorithm has been described previously 12] and has been used extensively (e.g.21,20,22]).It is based on a gradient descent based minimization, and is of complexity O(n 3 ) for a fully recurrent n-neuron network.The ability of the GNN to generalize after being trained with this learning algorithm has been demonstrated in several signi cant applications.For instance, a GNN trained to compress a speci c image (\Lena") has been demonstrated to compress e ciently (high compression ratio, high signal to noise ratio) long video sequences which contain images that the network has never been shown 20,21].Another network which was trained on some 20 to 30 samples of 5 by 5 pixel image elements in brain magnetic resonance images has been able to accurately segment large images into areas of grey matter, white matter and cerebrospinal uid 22].This paper therefore addresses a dif-ferent aspect.Although many applications have testi ed to the e ciency and accuracy of the GNN's learning ability, a theoretical investigation of its power as a universal approximator is also needed.
The remainder of this paper is organized as follows.In Section 2, a brief introduction of the GNN and of the bipolar GNN 9] (BGNN) are presented.The output of a GNN is computed from the ring probabilities of selected output neurons and these obviously have a range between 0 and 1.One approach to approximate functions with full range on R using the GNN obtains output values greater than 1 by considering the \average potential" 6, 8] of a neuron as the output value, while negative values are obtained by considering networks with \negative and positive neurons " 9].In this paper we study two extensions of the feedforward GNN with expanded network output range.In Section 3, we prove the approximation capability of these extended GNN models by showing that they are universal function approximators.The last section presents some conclusions.2 The GNN and Related Models Consider a GNN 6,8,12] with n neurons in which \positive" and \negative" signals circulate.This is a continuous time, discrete state space model.The i ?th neuron's state is represented at any time t by its \potential" k i (t) 0, which is a non-negative integral number.Positive signals represent excitation and e ect a (+1) operation on the potential of the neuron to which they arrive.Negative signals represent inhibition and e ect a (-1) operation on the potential of the neuron where they arrive, if the potential is positive; otherwise they have no e ect.If the potential of neuron i is positive it may \ re" in a random sequence with inter-ring times being independent, exponentially distributed random variables of rate r(i) 0, sending signals out to other neurons or to outside of the network.When a neuron res, then the potential of the neuron also drops by 1, depleting the neuron's potential.Signals arriving to a neuron may be exogenous (coming from outside sources).Exogenous excitatory signals arrive to neuron i in a Poisson stream of rate (i).Similarly exogenous inhibitory signals arrive to neuron i in a Poisson stream of rate (i).All these Poisson streams for all values of i = 1; :::; n are independent of each other.When neuron i res, an excitatory (or positive) signal may go to neuron j with probability p + (i; j), or as a negative signal with probability p ? (i; j), or it may depart from the network with probability d(i).We will have: m X j=1 p + (i; j) + p ? (i; j)] + d(i) = 1; i = 1; :::; n: (1) To simplify the notation, in the sequel we will write: !+ (i; j) = r(i)p + (i; j); (2) !?(i; j) = r(i)p ?(i; j): (3) Consider the quantities + (i) and ?(i) represent the average arrival rates of positive and negative signals to each neuron i, respectively.The key results about the GNN developed in 6, 8, 12] are summarized below: Theorem 1. (Proposition 1 in the Appendix of 12]) There always exists a solution such that + (i) 0, ?(i) 0 to the equations: ? (i) = (i) + X j q j !?(j; i); (5) for i = 1; :::; n where q i = + (i) r(i) + ?(i) : The second key result concerns the speci c form taken by the stationary joint probability of network state: Theorem 2. (Theorem 1 of 6]) For an n neuron GNN, let the vector of neuron potentials at time t be k(t) = (k 1 (t); k 2 (t); :::; k n (t)), and let k = (k 1 ; k 2 ; :::; k n ) be an n-vector of non-negative integers.Then if the q i in (6) satisfy 0 q i < 1, the stationary joint probability of network state is given by: (1 ?q i )q k i i : (8) Note that if the conditions of the Theorem 2 are satis ed then p(k i ) = lim t!1 p(k i (t) = k i ), the stationary probability of the state of neuron i, is given by: p(k i ) = (1 ?q i )q k i i ; (9) and q i = lim t!1 Probfk i (t) > 0g: (10)

The Bipolar GNN (BGNN)
In order to represent bipolar patterns and reinforce the associative memory capabilities of the GNN, Gelenbe, Stafylopatis and Likas 9] extended the original model by introducing two types of nodes { positive and negative neurons.The Bipolar GNN (BGNN) can also be viewed as the coupling of two complementary standard GNN models.
In the BGNN the two types of neurons have opposite roles.A positive neuron behaves exactly as a neuron in the original GNN.A negative neuron has a completely symmetrical behavior, namely only negative signals can accumulate at this neuron, and the role of positive signals is to eliminate negative signals which have accumulated in a negative neuron's potential.A positive signal arriving to a negative neuron i cancels a negative signal (adds +1 to the neuron's negative potential), and has no e ect if k i = 0.This extension is in fact mathematically equivalent to the original GNN described above, with respect to Theorems 1 and 2, as will be summarized below.In the sequel we shall show that the BGNN is a universal approximator for continuous functions.
In the BGNN, the emission of signals from a positive neuron is the same as in the original GNN.Similarly, a negative neuron may emit negative signals.A signal leaving negative neuron i arrives to neuron j as a negative signal with probability p + (i; j) and as a positive signal with probability p ? (i; j).Also, a signal departs from the network upon leaving neuron i with probability d(i).Other assumptions and denotations retain as in the original model.
Let us consider a BGNN with n nodes.Since negative signals account for the potential of negative neurons, we could use negative values for k i if neuron i is negative.If we take into account the distinction between positive and negative neurons, Theorems 1 and 2 can be summarized as follows for the BGNN.The ow of signals in the network is described by the following equations: + (i) = (i) + X j2P q j !+ (j; i) + X j2N q j !?(j; i); (11) ?(i) = (i) + X j2P q j !? (j; i) + X j2N q j !+ (j; i); (12) where we denote by P and N the set of positive and negative neurons respectively, and q i = + (i) r(i) + ?(i) ; i 2 P; (13) q i = ?(i) r(i) + + (i) ; i 2 N: (14) It can be shown that a non-negative solution f + (i); ?(i); i = 1; :::; ng exists to the above equations.If the q i < 1, i = 1; :::; n, then the steady-state joint probability distribution of network state is given by 9]: (1 ?q i )q jk i j i ; (15) where the quantity q i is the steady-state probability that node i is \excited".Note the jk i j exponent in the above product form, since the k 0 i s can be positive or negative, depending on the polarity of the i ?th neuron.

Feedforward GNN Models for Function Approximation
All feedforward models considered in this section are guaranteed to have an unique solution for the q i ,i = 1; :::; n as a result of Theorems 2 and 3 of 8].Thus from now on we do not revisit this issue.Consider a continuous function f : 0; 1] s 7 !R of an input vector X = (x 1 ; :::; x s ).Since an 0; 1] s 7 !R w function can always be separated into a group of w 0; 1] s 7 !R functions, we will only consider outputs in one dimension.
For most of this section we will concentrate on networks with a single input x 2 0; 1].Our purpose is to make sure that the various technical steps we take, which are essential simple but which require attention to detail, are clearly understood by the reader.The case where the input to the network is X = (x 1 ; :::; x s ) and we are approximating a continuous function f : 0; 1] s 7 !R is addressed in Section 3.1.
To approximate f, we will construct s-input, 1-output, L-layer feedforward GNN's.We will use the index (l; i) for the i ?th neuron at the l ?th layer.Furthermore, when we need to specify this, we will denote by M l the number of neurons in the l ?th layer.
In the L ? th or output layer there is only one neuron.As suggested in 6] we can use the output function A L;1 = q L;1 1 ?q L;1 (17) whose physical meaning is that it is the average potential of the output neuron as the output of the network.In this manner, we will have A L;1 2 0; +1), rather than just q L;1 2 0; 1].
This completes the proof of the Lemma.Q.E.D.
The following Lemma shows how an arbitrary polynomial of the form (18) with non-negative coe cients can be realized by a feedforward GNN.Lemma 4. Let P + (x) be a polynomial of the form (18) with the restriction that c v 0, i = 1; :::; m.Then there exists a feedforward GNN with a single output neuron (O) such that: q O = P + (x) 1 + P + (x) ; (24) so that the average potential of the output neuron is A O = P + (x).
Proof: The proof is by construction.Let C MAX be the largest of the coe cients in P + (x) and write P (x) = P + (x)=C MAX .Let c j = c j =C MAX 1 so that now each term c j 1 (1+x) j in P (x) is no greater than 1, j = 1; :::; m.We now take m networks of the form of Lemma 2 with r(j; 1) = 1, j = 1; :::; m and output values q j;1 = ( 1 1 + x ) j ; (25) and connect them to the new output neuron (O) by setting the probabilities p + ((j; 1); O) = c j =2, p ? ((j; 1); O) = c j =2.Furthermore we set an external positive and negative signal arrival rate (O) = (O) = c 0 =2 and r(O) = 1=(2C MAX for the output neuron.We now have: We now multiply the numerator and the denominator on the right hand side of the above expression by 2C MAX to obtain q O = P + (x) 1 + P + (x) : (27) so that which completes the proof of the Lemma.Q.E.D.
We now prove another technical lemma which will be of use in proving the approximating power of the \clamped GNN' discussed below.Lemma 5. Consider a term of the form x (1+x) v , for 0 x 1, and any v = 1; :::; m.There exists a feedforward GNN with a single output neuron (v + 1; 1) and input x 2 0; 1] such that q v+1;1 = x (1 + x) v : (28) The proof is very similar to that of Lemma 2 and will be omitted.
Finally, we state without proof another lemma, very similar to Lemma 4, but which uses terms of the form x=(1 + x) v to construct polynomials.It's proof uses Lemma 5, and follows exactly the same lines as Lemma 4.
Lemma 6.Let P o (x) be a polynomial of the form P o (x) = c 0 + c 1 x 1 + x + ::: + c m x (1 + x) m ; 0 x 1; (29) with non-negative coe cients, i.e. c v 0, i = 1; :::; m.Then there exists a feedforward GNN with a single output neuron (O; +) such that: q O = P o (x) 1 + P o (x) ; (30) so that the average potential of the output neuron is A O = P o (x).
The technical results given above pave the way for the use of the Bipolar GNN (BGNN) for approximating continuous functions.Theorem 3.For any continuous function f : 0; 1] 7 !R and any > 0, there exists a BGNN with one positive output neuron (O; +), one negative output neuron (O; ?), the input variable x, and the output variable y(x) such that: A O;?] = ?qO;?] 1 ?q O;?] ; (33) and sup x2 0;1] jf(x) ?y(x)j < .We will say that the BGNN's output uniformly approximates f(x).
The result is a direct application of Lemmas 1 and 4 and will be omitted.
Clearly, we can consider that these two GNN's constitute one network with two output neurons, and we have y(x) = c + P (x) + P o (x) = P(x), completing the proof.Q.E.D.

Function Approximation for s Input Variables
Now that the process for approximating a one-dimensional continuous function with the BGNN or the CGNN is well understood, let us consider the case of multiple inputs and the continuous function f : 0; 1] s 7 !R. Let us admit our version of the Weierstrass theorem (Lemma 2) in the case for s?inputs stating that there is a polynomial: P(x) = X m 1 0;:::;ms 0; P s v=1 mv=m c(m 1 ; :::; m s ) s v=1 1 (1 + x v ) mv ; (36) with coe cients c(m 1 ; :::; m s ) which approximates f uniformly.The basic result for approximating f using a BGNN then calls upon the following technical result which is simply an extension of Lemma 2.
In a similar manner we have the following result, which generalizes Theorem 4. Theorem 6.For any continuous function f : 0; 1] s 7 !R and any > 0, there exists a GNN with two output neurons (O; 1); (O; 2), and a constant c called the clamping constant, resulting in a function y(X) = A O;1 + A O;2 + c which approximates f uniformly on 0; 1] s with error less than .

Conclusions
The neural network model 6,8,12] discussed in this paper is a novel spiked model which represents more closely the manner in which signals are transmitted in a biophysical neural network where they mostly travel as voltage (action potential) spikes rather than as xed analog levels.The model has a closed form solution for network state even in the recurrent case, which leads to e cient computational techniques.It has been applied to many real-world problems 9,11,12,13,17,21,20,22,23], and has been shown to be able to e ciently carry out useful tasks such as optimization, learning, associative memory, pattern recognition.However, the general approximating properties of this network had not been previously established.In this paper we have studied the approximation of arbitrary continuous functions on 0; 1] s using this network model.We have shown that two extended versions of the model (the clamped GNN and the bipolar GNN) have this property.The proofs are by construction of appropriate GNN's which realize the approximating polynomials based on Weierstrass' Theorem.We do not address here the minimal representations of these approximating networks, although we present straightforward canonical constructions of the appropriate GNN's in each case based upon sub-networks which implement the terms of the approximating polynomial.Future work will address the design of other canonical structures, such as a three-layer GNN's, for function approximation.