Towards zero-configuration condition monitoring based on dictionary learning

Condition-based predictive maintenance can significantly improve overall equipment effectiveness provided that appropriate monitoring methods are used. Online condition monitoring systems are customized to each type of machine and need to be reconfigured when conditions change, which is costly and requires expert knowledge. Basic feature extraction methods limited to signal distribution functions and spectra are commonly used, making it difficult to automatically analyze and compare machine conditions. In this paper, we investigate the possibility to automate the condition monitoring process by continuously learning a dictionary of optimized shift-invariant feature vectors using a well-known sparse approximation method. We study how the feature vectors learned from a vibration signal evolve over time when a fault develops within a ball bearing of a rotating machine. We quantify the adaptation rate of learned features and find that this quantity changes significantly in the transitions between normal and faulty states of operation of the ball bearing.


INTRODUCTION
Condition monitoring of machine elements is used to detect faults, reduce machine downtime and improve overall equipment effectiveness, for example by condition-based predictive maintenance. The requirements on the methods employed to achieve that go beyond fault detection, in particular in terms of prediction of faults [1] and detection of abnormal operational conditions. Early detection and characterization of emerging faults is a challenging problem because there are many variables that affect the operation of the machine and the characteristics of the fault. Maintenance operations rely on time and frequency domain features for diagnosis [1]. Expert knowledge is often needed to interpret the features and make decisions, which makes the process difficult to automate. Furthermore, condition monitoring methods are typically tuned to the application, the operating conditions and the type and location of the fault. Therefore, such methods are expensive to maintain when machines have varying characteristics and evolve over time, for example as a consequence of maintenance and repair, which limits the scalability of the approach. Also, it is difficult to predict all failure modes. Similarly, approaches based on traditional pattern recognition methods require substantial amounts of labeled training data and the resulting methods are limited to the conditions for which the method was designed and trained [2].
Sparse representation of signals has attracted considerable interest in the last decade [3][4][5][6]. One type of sparse representation can be obtained by modeling signals as a linear superposition of noise and a small number of atomic waveforms (atoms) of particular shapes, amplitudes and shifts, so-called shift-invariant sparse coding [7,8]. Using an approach known as dictionary learning the atoms can also be optimized to the signal [3,6,9], so that each particular atom represents structural features of the signal, which for example are excited by different physical processes. Such approximations are of increasing interest in signal processing with applications ranging from denoising, source coding, source separation, and signal acquisition. The problem of finding such sparse representations and optimal atoms is NP-hard in general. Therefore, suboptimal strategies based on convex relaxation, nonconvex (often gradient based) local optimization or greedy search strategies are used in practise. Liu et al. [10] investigate the possibility that faults in a machine can be identified with multiclass linear discriminant analysis using dictionaries of atoms that are optimized to sets of signals corresponding to different fault conditions of a rotating machine.
In this paper we complement the study by Liu et al. by investigating how one dictionary of atoms changes over time in an online condition monitoring scenario. The dictionary is continuously optimized to a vibration signal, measured from a machine, that evolves from a normal state of operation to faulty conditions. We otherwise use a similar method for dictionary learning that is suited for online monitoring [11], and vibration signals from the same dataset [12]. The work presented here is novel because it focuses on online monitoring and the continuous evolution of an automatically learned dictionary, rather than supervised learning of multiple dictionaries for each fault condition. We demonstrate that devia-tions from the normal state of the machine in principle can be detected via monitoring of the learned dictionary over time. We define an evolution rate for the atoms in a dictionary and demonstrate that this rate decreases to low values after some time of adaptation, and that it increases significantly when faults are introduced in the system. The resulting atoms are also useful for further classification and diagnosis of the condition [10,11]. We find that some atoms characterize the vibration of the machine in both normal and abnormal operational conditions, while other waveforms are clearly associated with the faults. These preliminary results indicate that online monitoring of a learned dictionary is a potentially useful approach to zero-configuration fault detection. The approach also provides atoms representing inherent structural features in the signal that can be used for diagnosis and prediction.

SPARSE CODING AND DICTIONARY LEARNING
The model [11] used here was developed by Smith and Lewicki [13], and it is inspired by former work on sparse visual coding [14]. Smith and Lewicki discovered that atoms learned from speech data closely resemble cochlear impulse response functions (revcor filters), which indicates that speech is adapted to the ear [13]. Our working hypothesis is that features that characterize machines can be learned in a similar manner. The model decomposes a signal, x(t), as a linear superposition of noise and atomic waveforms with compact support The functions φ m (t) are atoms that represent morphological features of the signal and M indicates the total number of such atoms. The variable N m refers to the number of instances of atom φ m , and the temporal position and amplitude of the i-th instance of atom φ m are denoted by τ m,i and a m,i , respectively. The set of M atoms defines a dictionary The values of τ i and a i are determined with a matching pursuit (MP) algorithm [15,16] with maximum a posteriori (MAP) optimization [17] for dictionary update. At each iteration n, MP is used to decompose the signal in the following steps: 1. Initialization: n = 0, R 0 (t) = x(t); 2. calculate cross-correlations between signal, x(t), and all shift-invariant atoms, φ m (t). The coefficients, a m,i , takes the form where the temporal position, τ m,i , is determined by the cross-correlation 3. update of the residual 4. if the signal-to-residual ratio (SRR) or sparsity (number of samples over n) reaches a predefined threshold the decomposition process stops; 5. each atom of dictionary, Φ, is updated with a gradient procedure outlined below; 6. continue decomposition of the next set of (partially overlapping) signal samples. The problem to learn the dictionary, Φ, is the main challenge and opportunity of this approach, which makes it fundamentally different from traditional condition-monitoring approaches. The goal of this problem is to automatically calculate an optimal set of atomic waveforms, φ m , in the dictionary, Φ, for a particular signal domain. The solution to this problem can be obtained by rewriting Eq. (1) in probabilistic form whereâ is the maximum a posteriori (MAP) estimation of a, that is generated by the MP [13]. The prior of the amplitude, p(a), is defined to promote sparse coding in terms of statistically independent atoms [14] and it assumes that the likelihood , p(x|a, Φ), is Gaussian. This results in a learning algorithm that involves gradient ascent on the approximate log data probability [13] ∂ ∂φ m log(p(x|Φ)) = 1 The gradient of each atom in the dictionary is proportional to the sum of residuals corresponding to the MP activation of that atom. In order to use the gradient for optimization we introduce a learning rate, or stepsize parameter, η. Eq. (8) becomes This means that the learning rate depends on the activation rate of atoms, which implies that the learning rate of atoms can be different, and that some atoms may not learn at all. Several improvements of this methodology have been proposed, including methods to enforce orthogonality in the MP. Such methods improve the reconstruction accuracy significantly for noiseless signals, but the effect on denoising performance is moderate. Our method is comparable to that used by Liu et al. [10] and is motivated by the relatively low complexity and simplicity of the algorithm, which allows for online condition monitoring experiments in embedded systems.
We are interested in quantitative changes of the learned atoms resulting from changing conditions in a rotating machine. Skretting [18] proposes a dictionary distance measure as a means to quantify the similarity between two dictionaries. This approach is useful for diagnosis purposes but has limitations in an online monitoring scenario because only a subset of the atoms may change when a fault emerges, possibly resulting in high dictionary similarity. Therefore, we define the following evolution rate for each atom where φ a (t) is an atom of dictionary Φ at time t and φ b (t − δ) is the corresponding atom at a previous point in time, t − δ. This quantity is calculated for each atom and it indicates how quickly individual atoms are changing. A value of zero means no change at all, while a value close to one means that an atom is uncorrelated with the most corresponding atom in the past.

CHARACTERIZATION OF ROTATING MACHINE WITH FAULT IN ROLLING ELEMENT BEARING
We apply the MP with dictionary learning approach to vibration data from a rotating machine at the bearing data center at Case Western Reserve University [12]. The vibration data was generated with a test rig consisting of an electric motor, a torque transducer, a dynamometer and a ball bearing supporting the motor shaft. An accelerometer located at the drive end of the motor is used to record the vibration data. The accelerometer is sampled 12000 times per second. During data acquisition, the load varies between 0 HP and 3 HP, resulting in a varying motor speed from 1800 to 1730 rpm. We consider three different datasets in order to mimic the appearance and growth of a defect in the bearing, thereby simulating the evolution of the machine from a normal state of operation to a faulty state of operation. First, MP with dictionary learning is applied to 120 minutes of vibration data corresponding to a normal, non-faulty bearing. This is referred to as the baseline (BL) case and the resulting atoms are illustrated in Figure 1. Next, the atoms are further adapted to 120 minutes of data corresponding to a faulty bearing with a 7 mils (0.18 mm) diameter fault on the inner race. We refer to this as the IR7 case and the resulting atoms are also illustrated in Figure 1. Finally, the IR7 atoms are further adapted to 120 minutes of vibration data corresponding to a faulty bearing with a 14 mils (0.356 mm) fault on the inner race (IR14).
The vibration data is processed with our Matlab implementation of Smith and Lewicki's algorithm [13]. The dictionary initially contains sixteen atoms of length fifty sampled from a Gaussian distribution with zero mean. Dictionary learning is carried out using a signal window of 5 seconds duration (60000 samples). The windows are sampled randomly from the different load and rpm cases, thereby simulating a time-varying load on the rotating machine. Atoms are allowed to grow in length when the tail RMS exceeds a threshold [13] and are always normalized. MP is stopped at one order of magnitude reduction in the data rate, or at a 12 dB SRR. The dictionaries resulting from the BL, IR7 and IR14 cases are shown in Figure 1, each including the sixteen atomic waveforms obtained at the end of a 120 minute adaptation time for each case. All waveforms are normalized and have the same scale. Each panel in Figure 1 illustrates one atom for the BL case (top), IR7 case (middle) and IR14 case (bottom). Atoms 1, 2 and 4 reach approximately stationary conditions after 120 minutes. Atoms 9, 10, 12, 13, 14, 15 and 16 change over time and enable distinction of the BL and IR7 cases. The difference between the IR7 and IR14 cases is evident from the time evolution of atoms 9, 10, 12 and 14. Furthermore, the differences between atoms 3, 5, 6, 7 and 8 distinguish the BL and IR14 cases. Table 1 shows the center frequencies of the atoms in the three cases, calculated as the mean value of the power spectral density of each atom. By calculating the evolution rate (rate of change) of the atoms we notice changes in the characteristics of the rotating machine, which are associated with the introduction of a fault in the bearing. Figure 2 shows the evolution rate of all the atoms in the dictionary as defined by Eq. (10) and using δ = 10 minutes. Atom 3 stops evolving when the IR7 case is introduced after 120 minutes, this is represented by the disappearing bold line between 120 and 240 minutes, which is a consequence of the vanishing event rate, see Table 1. The center frequency of atom 3 is nearly identical in the BL and IR7 cases, see Table 1. Atom 3 continues to adapt after 240 minutes when the IR14 case is introduced. This is in agreement with Figure 1, which shows that atom 3 is similar for the BL and IR7 cases, while it has a different shape in the IR14 case. Atom 13 is inactive during the BL case, as indicated by the vanishing event rate in Table 1, but it starts to adapt in the IR7 case and eventually attains an impulse-like shape. In contrast, atom 2 adapts in the BL case and thereafter remains unchanged, see Figure 1. The center frequencies and event rates listed in Table 1, the evolution rate displayed in Figure 2 and the dictionary illustrated in Figure 1 provide complementary information about the three different operational conditions of the machine.
In Figure 3 we present a scatter plot of atom event rates versus the center frequency for the three cases listed in Table 1. It is evident that atoms with a lower center frequency occur in the BL case, while the cases including a bearing fault (IR7 and IR14) result in adaptation and activation of atoms with higher center frequencies. Furthermore, a comparison between the IR7 and IR14 cases reveals differences in the event rates associated with some of the atoms. In summary, these results indicate that changes in the operational conditions and characteristics of a rotating machine can be automatically detected using unsupervised dictionary learning. Further work is required in the development of reliable measures for change detection during continuous monitoring of a rotating machine, including methods to avoid false positives associated with variations in the operation of the machine.

DISCUSSION
We investigate the possibility to automatically characterize a rotating machine and detect when faults appear in the machine by monitoring a dictionary of learned atomic waveforms. We find that the shape, frequency and repetition characteristics of the atoms depend on the operational conditions of the machine considered here. Furthermore, we define the rate of change of atoms (the atom evolution rate) and illustrate that it can be useful for automatic detection of faults. These results motivate further experiments with more realistic failure modes and varying operational conditions. Further work is required to investigate and develop reliable measures for automatic change detection, possibly using a complementary knowledge base including atoms learned from similar machines with known operational conditions. In addition, deep learning extensions can be investigated for classification and prediction purposes. Dictionary learning offers a novel approach to online condition monitoring, which unlike most traditional techniques requires few assumptions about the machine and structure of the signal. Further work is needed to study the usefulness of the method under more realistic conditions such as speed, load and fault evolution. A method that requires a minimum of configuration is needed to enable scalable condition monitoring in the era of the Internet of Things

ACKNOWLEDGMENTS
This work is supported by SKF, the Kempe Foundations, and the Swedish Foundation for International Cooperation in Research and Higher Education (STINT), grant IG2011-2025.