Nonnegative autoencoder with simplified random neural network

This paper proposes new nonnegative (shallow and multi-layer) autoencoders by combining the spiking Random Neural Network (RNN) model, the network architecture typical used in deep-learning area and the training technique inspired from nonnegative matrix factorization (NMF). The shallow autoencoder is a simplified RNN model, which is then stacked into a multi-layer architecture. The learning algorithm is based on the weight update rules in NMF, subject to the nonnegative probability constraints of the RNN. The autoencoders equipped with this learning algorithm are tested on typical image datasets including the MNIST, Yale face and CIFAR-10 datasets, and also using 16 real-world datasets from different areas. The results obtained through these tests yield the desired high learning and recognition accuracy. Also, numerical simulations of the stochastic spiking behavior of this RNN auto encoder, show that it can be implemented in a highly-distributed manner.

clustering in [26,28] and presented simple update rules for orthogonal NMF. Wang [25] provided a comprehensive review on recent processes in the NMF area. This paper first exploits the structure of the RNN equations as a quasi-linear structure. Using it in the feed-forward case, an RNN-based shallow nonnegative autoencoder is constructed. Then, this shallow autoencoder is stacked into a multi-layer feed-forward autoencoder following the network architecture in the deep learning area [16,17,19]. Since connecting weights in the RNN are products of firing rates and transition probabilities, they are subject to the constraints of nonnegativity and that the sum of probabilities is no larger than 1, which are called the RNN constraints in this paper. In view of that, the conventional gradient descent is not applicable for training such an autoencoder. By adapting the update rules from nonnegative graph embedding that can be seemed as a variant of NMF, applicable update rules are developed for the autoencoder that satisfy the first RNN constraint of nonnegativity. For the second RNN constraint, we impose a check-and-adjust procedure into the iterative learning process of the learning algorithms. The training procedure of SGD is also adapted into the algorithms. The efficacy of the nonnegative autoencoders equipped with the learning algorithms is well verified via numerical experiments on both typical image datasets including the MNIST [29], Yale face [30] and CIFAR-10 [31] datesets and 16 real-world datasets in different areas from the UCI machine learning repository [32]. Then, we simulate the spiking behaviors of the RNNbased autoencoder, where simulation results conform well with the corresponding numerical results, therefore demonstrating that this nonnegative autoencoder can be implemented in a highly-distributed and parallel manner.

A quasi-linear simplified random neural network
An arbitrary neuron in the RNN can receive excitatory or inhibitory spikes from external sources, in which case they arrive according to independent Poisson processes. Excitatory or inhibitory spikes can also arrive from other neurons to a given neuron, in which case they arrive when the sending neuron fires, which happens only if that neuron's input state is positive (i.e. the neuron is excited) and inter-firing intervals from the same neuron v are exponentially distributed random variables with rate r v ≥ 0. Since the firing times depend on the internal state of the sensing neuron, the arrival process of neurons from other cells is not in general Poisson. From the preceding assumptions it was proved in [3] that for an arbitrary N neuron RNN, which may or may not be recurrent (i.e. containing feedback loops), the probability in steady-state that any cell h, located anywhere in the network, is excited is given by the expression: for h = 1, ... , N , where p + vh , p − vh are the probabilities that cell v may send excitatory or inhibitory spikes to cell h, and λ + h , λ − h are the external arrival rates of excitatory and inhibitory spikes to neuron h. Note that min(a, b) is a element-wise operation whose output is the smaller one between a and b. In [3], it was shown that the system of N non-linear equations (1) have a solution which is unique.
Before adapting the RNN as a non-negative autoencoder (Section 3), we will simplify the recurrent RNN model into the feed-forward structure shown in Figure 1. The simplified RNN has an input layer and a hidden layer. The V input neurons receive excitatory spikes from the outside world, and they fire excitatory spikes to the H hidden neurons.
Let us denote byq v the probability that the vth input neuron (v = 1, · · · , V ) is excited and q h the probability that the hth hidden neuron (h = 1, · · · , H) is excited. According to [1] and (1), they are given byq v = min(Λ + v /r v , 1), and q h = min(Λ + h /r h , 1), where the quantitiesΛ + v and Λ + h represent the total average arrival rates of excitatory spikes,r v and r h represent the firing rates of the neurons. Neurons in this model interact with each other in the following manner, where h = 1, · · · , H and v = 1, · · · , V . When the vth input neuron fires, it sends excitatory spikes to the hth hidden neuron with probability p + v,h ≥ 0. Clearly, H h=1 p + v,h ≤ 1. • The vth input neuron receives excitatory spikes from the outside world with rate x v ≥ 0.
• When the hth hidden neuron fires, it sends excitatory spikes outside the network.
Let us denote w v,h = p + v,hr v . For simplicity, let us set the firing rates of all neurons tor v = r h = 1 or that w v,hqv , and using the fact that q h , q v are probabilities, we can write: subject to H h=1 w v,h ≤ 1. We can see from (2) that this simplified RNN is quasi linear. For the network shown in Figure 1, we call it a quasi-linear RNN (LRNN).

Shallow non-negative LRNN autoencoder
We add an output layer with O neurons on top of the hidden layer of the LRNN shown in Figure 1 to construct a shallow non-negative LRNN autoencoder. Let q o denote the probability that the oth output neuron is excited, and the oth output neurons interact with the LRNN in the following manner, where o = 1, · · · , O.
• When the hth hidden neuron fires, it sends excitatory spikes to the oth output neuron with probability The shallow LRNN autoencoder is described bŷ where O = V and the input, hidden and output layers are the visual, encoding and decoding layers.
where D is the number of instances, each instance has V attributes and x d,v is the vth attribute of the dth instance.
We import X into the input layer of the LRNN autoencoder. Letq d,v , q d,h and q d,o respectively denote the values ofq v , q d and q o for the dth instance.
Then, (3) can be rewritten as the following matrix manner: The problem for the autoecoder to learn the dataset X can be described as We use the following update rules to solve this problem, which are simplified from Liu's work [33]: where the symbol (·) v,h denotes the element in the vth row and hth column of a matrix. Note that, to avoid the division-by-zero problem, zero elements in the denominators of (6) and (7) are replaced with tiny positive values, (e.g., "eps" in MATLAB). After each update, adjustments need to be made such that W and W satisfy the RNN constraints. The procedure to train the shallow LRNN autoencoder (4) is given in Algorithm 1, where the operation max(W ) produces the maximal element in W , the guarantee that the weights satisfy the RNN constraints, and the operations W ← W/ max(XW ) and W ← W / max(HW ) normalize the weights to reduce the number of neurons that are saturated.
Algorithm 1 Procedure for training a shallow nonnegatvie LRNN autoencder (4) Randomly initialize W and W that satisfy RNN constraints while terminal condition is not satisfied do for each minibatchX do update W with (6) for

Multi-layer non-negative LRNN autoencoder
We stack multi LRNNs to build a multi-layer non-negative LRNN autoencoder. Suppose the multilayer autoencoder has a visual layer, M encoding layers and M decoding layer (M ≥ 2), and they are connected in series with excitatory weights W m and W with m = 1, · · · , M . We import a dataset LetQ denote the state of the visual layer, Q m denote the state of the mth encoding layer and Q m denote the state of the mth decoding layer. Then, the multi-layer LRNN autoencoder is described by Q = min(X, 1), Q 1 = min(QW 1 , 1), Q m = min(Q m−1 W m , 1), with m = 2, · · · , M . The RNN constraints for (8) are W m ≥ 0, W m ≥ 0 and the summation of each row in W m and W m is not larger than 1, where m = 1, · · · , M . The problem for the multi-layer LRNN autoencoder (8) to learn dataset X can be described as arg min subject to the RNN constraints, where m = 1, · · · , M . The procedure to train the multi-layer non-negative LRNN autoencder (8) is given in Algorithm 2.
To avoid loading the whole dataset into the computer memory, we could also use Algorithm 3 to train the autoencoder, where the update rules could be with m = 2, · · · , M and the operation denoting element-wise product of two matrices. To avoid the division-by-zero problem, zero elements in denominators of (10) and (11)  Yale face: This database (http://vision.ucsd.edu/content/yale-face-database) contains 165 gray scale images of 15 individuals. Here we use the pre-processed dataset from [30], where each image is resized as 32 × 32 (1024 pixels).
UCI real-world datasets: In addition to image datasets, we also conduct numerical experiments on different real-world datasets in different areas from the UCI machine learning repository [32]. The names, attribute numbers and instance numbers of these datasets are listed in Table 1.

Convergence and reconstruction performance
Results of MNIST: Let us first test the convergence and reconstruction performance of the shallow non-negative LRNN autoencoder. We use structures of 784 → 100 (for simplicity, we use the encoding part to represent an autoencoder) and 784 → 50 and the MNIST dataset for experiments. The whole training dataset of 60,000 images is used for training. Figure 2(a) shows the curves of training error (mean square error) versus the number of iterations, where, in each iteration, a minibatch of size 100 is handled. Then, we use a multi-layer non-negative LRNN autoencoder with structure 784 → 1000 → 500 → 250 → 50, and the corresponding curve of training error versus iterations is also given in Figure 2(a). It can be seen from Figure 2(a) that reconstruction errors using the LRNN autoencoders equipped with the developed algorithms converge well for different structures. In addition, the lowest errors using the shallow and multi-layer autoencoders are respectively 0.0204 and 0.0190. The results show that, for the same encoding dimension, the performances of the shallow and multi-layer structures are similar for this dataset.  [40] 51 6118 Sonar [41] 60 208 Cardiac Arrhythmia (CA) [42] 279 452  Figure 2(b). For this dataset, the shallow autoencoder seems more stable than the multi-layer one.
Results of CIFAR-10: Attribute values of the dataset are also divided by 255 for normalization in range [0, 1]. The structures used are 3072 → 150 and 3072 → 1000 → 500 → 150. Both the training and testing dataset (total 60,000 images) are used for training the autoencoders. The size of minibatch is chosen as 100. The results are given in Figure 2(c). We can see that reconstruction errors for both structures converge as the number of iterations increases. In addition, the lowest reconstruction errors in using the shallow and multi-layer autoencoders are the same (0.0082). These results together with those with the MNIST and Yale face datasets (Figures 2(a) to 2(c)) verify the good convergence and reconstruction performance of both the shallow and multi-layer LRNN autoencoders for handling image datsets.  Figure 3. We see that the reconstruction errors generally decrease as the number of iterations increases. These results also demonstrate the efficacy of the nonnegative LRNN autoencoders equipped with the training algorithms.

Results
6 Simulating the spiking random neural network The advantage of a spiking model, such as the LRNN autoencoder, lays on its highly-distributed nature. In this section, rather than numerical calculation, we simulate the stochastic spiking behaviors of the LRNN autoencoder. The simulation in this section is based on the numerical experiment of Subsection 5.2. Specifically, in Subsection 5.2, we construct a LRNN autoencoder of structure 784 → 100 (with appropriate weights found), which has three layers: the visual layer (784 neurons), hidden layer (100 neurons) and output layer (784 neurons). First, an image with 28 × 28 = 784 attributes is taken from the MNIST dataset. Each visual neuron receives excitatory spikes from outside the network in a Poisson stream with the rate being the corresponding attribute value in the image. When activated, the visual neurons fire excitatory spikes to the hidden neurons according to the Poisson process with rate 1 (meaning w v,h = p + v,h ). When the vth visual neuron fires to the hidden layer, the spike goes to the hth hidden neuron with probability p + v,h or it goes outside the network with probability 1 − layer in a similar manner subjecting to w h,o . The firing rate of output neurons is 1 and the spikes go outside the network with probability 1.
In the simulation, we call it an event whenever a spike gets in from outside the network or a neuron fires. During the simulation, we observe the potential (the level of activation) of each neuron once every 1,000 events. Let k i,b represent the bth observation of the ith neuron. We estimate the average potential of the ith neuron, denoted byk i , simply by averaging observations, i.e.,k i ≈ ( B b=1 k i,b )/B. Let q i denote the probability that the ith neuron is activated. The relation between q i andk i is known ask i = q i /(1 − q i ). Then, the value of q i can be estimated during the simulation as: In Figure 4, we visualize the estimated values of q i for all neurons in different layers after 10,000, 100,000 and 1,000,000 events during the simulation. For comparison, numerical results from Subsection 5.2 are also given in Figure 4. At the beginning, simulation results of only the visual layer are close to its numerical results. As time evolves, the simulation results of the hidden and output layers and their corresponding numerical results become more and more similar. These results demonstrate that the LRNN autoencoders have the potential to be implemented in a highly distributed and parallel manner.

Conclusions
New nonnegative autoencoders (the shallow and multi-layer LRNN autoencoders) have been proposed based on the spiking RNN model, which adopt the feed-forword multi-layer network architecture in the deep-learning area. To comply the RNN constraints of nonnegativity and that the sum of probabilities is no larger than 1, learning algorithms have been developed by adapting weight update rules from the NMF area. Numerical results based on typical image datasets including the MNIST, Yale face and CIFAR-10 datesets and 16 real-world datasets from different areas have well verified the robust convergence and reconstruction performance of the LRNN autoencoder. In addition to numerical experiments, we have conducted simulations of the autoencoder where the stochastic spiking behaviors are simulated. Simulation results conform well with the corresponding numerical results. This demonstrates that the LRNN autoencoder can be implemented in a highly distributed and parallel manner.