Inference of spatiotemporal processes over graphs via kernel kriged Kalman filtering

Inference of space-time signals evolving over graphs emerges naturally in a number of network science related applications. A frequently encountered challenge pertains to reconstructing such dynamic processes given their values over a subset of vertices and time instants. The present paper develops a graph-aware kernel-based kriged Kalman filtering approach that leverages the spatio-temporal dynamics to allow for efficient online reconstruction, while also coping with dynamically evolving network topologies. Laplacian kernels are employed to perform kriging over the graph when spatial second-order statistics are unknown, as is often the case. Numerical tests with synthetic and real data illustrate the superior reconstruction performance of the proposed approach.


I. INTRODUCTION
A number of applications involve data that can be efficiently represented as node attributes over social, economic, sensor, communication, and biological networks [1].An inference problem that often emerges is to predict the attributes of all nodes in the network given the attributes of a subset of nodes.Such a task is of paramount importance in applications where collecting the attributes at all nodes is prohibitive, as is the case when sampling massive graphs or when the attribute of interest is of sensitive nature such as the transmission of HIV in a social network.This problem has been formulated as extrapolation or reconstruction of a function or signal on a graph [1], [2].Extrapolation typically leverages smoothness of the attributes with respect to the graph, meaning that connected nodes have similar attribute values.Oftentimes, the aforementioned networks and attributes evolve over time.The space-time dynamics of such time-varying graph functions should be properly modeled to achieve accurate reconstruction over space and time.
Reconstruction of time-invariant graph functions has attracted great attention in recent years.The community of signal processing on graphs mainly adopts the so-called bandlimited model, which postulates that the signal of interest lies in a B-dimensional subspace related to the graph topology [3], [4], or assumes that the signal can be sparsely represented on The work of V. N. Ioannidis and G. B. Giannakis was supported by ARO grant W911NF-15-1-0492 and NSF grants 1343248, 1442686, and 1514056.
an over-complete dictionary [5].On the other hand, the machine learning community advocates estimators that exploit the aforementioned notion of smoothness [6], [7].Interestingly, most estimators considered by both communities can be seen as special cases of kernel-based estimators [2].
On the other hand, reconstruction of time-varying graph functions has been typically tackled by assuming that the function of interest changes slowly over time.Distributed reconstruction methods are reported in [8] and [9].However, they rely on the bandlimited model, whose effectiveness in capturing the dynamics of real-world graph functions may not hold.A kernel-based Kalman filter that captures muliple forms of spatiotemporal dynamics through space-time kernels was explored in [10].But it mainly relies on smoothness and does not explicitly account for the underlying dynamics.However, there are cases where the wanted function exhibits markedly different behaviors over space and time, which existing approaches cannot account for.To circumvent this limitation, a kriged graph Kalman filter is introduced in this paper.
Kriging has been traditionally employed to interpolate stationary spatial processes that take values over subsets of the Euclidean space [11,Ch. 3].Kriging essentially performs linear minimum mean-square error (LMMSE) estimation.To accommodate time-evolving fields, [12] introduced the kriged Kalman filter (KrKF), which affords low-complexity online spatial prediction.A reduced-dimension version of the KrKF was introduced in [13] by expanding the spatio-temporal process as a linear combination of basis functions and applying the Kalman filter (KF) to the expansion coefficients.
Kriging was extended in [14] to estimate path delays over IP networks modeled by time-evolving functions defined on the edges of a graph.Building on [14], [15] exploits temporal dynamics through the KrKF for estimating network delays.However, [15] adopts a random walk model on a static graph and therefore cannot capture general spatial dynamics.All these KrKF approaches require knowledge of the spatial statistics, which are furthermore assumed fixed over time.
The main contribution of this paper is to extend the KrKF for prediction of general spatiotemporal processes that evolve over dynamic graphs whose topology may change over time.The resulting estimator is capable of promoting smoothness over time through a state-space model, and smoothness over space through kriging.A generalization of the latter based on Laplacian kernels is introduced to cope with uncertainty in the spatial statistics of the process.The computational complexity of the proposed algorithm is linear in the number of time samples, rendering it attractive for online and big data applications.
The rest of the paper is structured as follows.Sec.II formulates the problem and presents the proposed model.Sec.III presents the kernel KrKF (KKrKF) for graphs.The numerical experiments in Sec.IV demonstrate the benefits of the proposed method.Finally, Sec.V provides some closing remarks.
Notation: Scalars are denoted by lowercase, column vectors by bold lowercase, and matrices by bold uppercase letters.Superscripts and † respectively denote transpose and pseudo-inverse; E[x] stands for the expectation of the random vector x; 1 N for the N × 1 all-one vector; δ[•] for the Kronecker delta; and diag {x} corresponds to a diagonal matrix with the entries of x on its diagonal.Finally, if A is matrix and x a vector, then ||x|| 2 A := x A −1 x and ||x|| 2 2 := x x.

II. MODELING AND PROBLEM FORMULATION
is the nonnegative edge weight connection vertices v n and v n at time t.The graphs in this paper are undirected and have no self-loops, which respectively imply that A[t] = A [t] and A n,n [t] = 0, ∀t, n.The Laplacian matrix is defined as and is known to be positive semidefinite [1].A time-varying graph function (or signal) is a map f : V × T → R, where T = {1, 2, . ..} is the set of time indices.Specifically, f n [t] := f (v n , t) represents an attribute value at node n and time t, e.g. the closing price of the n-th stock on the t-th day.
Vector t] can be compactly arranged as where where {f ν [t]} t are temporally uncorrelated and capture only spatial dependencies, while {f χ [t]} t are spatio-temporally colored obeying the state equation where P [t] is an N × N transition matrix, and and it is assumed uncorrelated with η[t] and e[t]; that is ,N , and also uncorrelated with Given the model described by ( 1)-( 3), the goal of this paper is to reconstruct Remark 1.In the field of geostatistics, f ν [t] models the so-termed small-scale spatial fluctuations, while f χ [t] corresponds to the so-called trend.The decomposition ( 2) is often dictated by the sampling interval: whereas f χ [t] captures slow dynamics relative to the sampling interval, fast variations are modeled with f ν [t].Examples motivating (2) include network delay prediction [15], where f χ [t] represents the queuing delay while f ν [t] the propagation, transmission, and processing delays.Likewise, when predicting prices across different stocks, f χ [t] captures the daily evolution of the stock market, which is correlated across stocks and time samples, while f ν [t] describes unexpected changes, such as the daily drop of the stock market due to political statements, which are considered uncorrelated over time.
Remark 2. The state transition matrix P [t] can be selected in accordance with the prior information available.Simplicity in estimation motivates the random walk model, where P [t] = αI N with α > 0. On the other hand, adherence to the graph, prompts the selection P [t] = αA[t], in which case (3) amounts to a graph-constrained vector autoregressive model; see e.g.[17].

A. Kriged Kalman Filter
The KrKF algorithm was first introduced for prediction of processes evolving over continuous fields, as typically occurs in geostatistics [13].In contrast, this section reviews the KrKF for processes f [t] that evolve over a graph [15], where estimation is performed in two steps.In the first step, an estimate fχ [t|t] is obtained from the measurements {y[τ ]} t τ =1 using the traditional Kalman filter (KF) [16,Ch. 3] with the unknown f ν [t] lumped in the observation noise.In the second step, f ν [t] is estimated through the kriging predictor [11], which is given by the LMMSE estimator where A challenge associated with KrKF is that the kriging predictor (4) requires knowledge of Σ ν [t].The next subsection reviews graph Laplacian kernels, and proposes a generalization of the kriging predictor that does not require knowledge of the underlying spatial statistics.

B. Kernel Kriged Kalman Filter
After recognizing that the kriging predictor is a special case of the KRR estimator, our KKrKF algorithm is presented.
Kernel ridge regression seeks an estimate of a graph function f ν given the observations ψ = Sf ν + e.The argument [t] is dropped to reflect that a single time instant will be considered.The KRR estimate of f ν is given by [2] fν = arg min where µ is a user-selected regularization parameter and K > 0 is a kernel matrix, whose (n, n )-th entry encodes some notion of similarity between v n and v n [2], [6], [7], [18].Notice that the KRR estimator (6) reduces to the kriging predictor (4) if µS = σ 2 e and Σ ν = K.As a result, (6) generalizes (4) in the sense that f ν can be deterministic, so long as it belongs to a reproducing kernel Hilbert space generated by the prescribed K. Rather than minimizing the LMMSE criterion, the resulting KRR can account for the underlying graph through a judicious selection of K.
Laplacian kernels have been widely used [2], [7] to promote the smoothness embodied in the graph topology.For a given Laplacian matrix with eigendecomposition L = U diag{λ}U , a Laplacian kernel is defined as [7] where r : R → R is a monotonically increasing function.Table I summarizes common choices of r(•), which can be selected to promote a certain structure in the so-called graph Fourier transform of f ν [1], [2], [7].To sum up, one can obtain fν through (4) after replacing Σ ν with a Laplacian kernel.
Interestingly, for a class of random f ν there exists a function r(•) such that the LMMSE and KRR estimators yield the same estimate.These graph signals are deemed to be graph stationary in [19], [20], and their covariance matrix is diagonalizable by the eigenvectors of L. Proposition 1.If f ν is a graph stationary signal on G = (V, A), and the Laplacian L = U diag{λ}U has distinct Kernels Function Diffusion [6] r(λ) = exp{σ 2 λ/2} Laplacian regularization [1], [7] r end for eigenvalues, then the covariance matrix Σ ν of f ν is a Laplacian kernel.
for some r(•), and the estimates ( 4) and ( 6) coincide.The proposed KKrKF is summarized as Algorithm 1.This online estimator with complexity O(N 3 ) per t, tracks the temporal variations of the signal of interest through (3), and promotes desired properties such as smoothness over the graph, by judiciously selecting the Laplacian kernel.Different from existing approaches, our KKrKF takes into account the underlying graph structure in estimating f ν [t] as well as f χ [t].Furthermore, by using the Laplacian matrix in (8), it can also accommodate dynamic graph topologies.

IV. SIMULATIONS
This section describes tests on synthetic and real graph functions over dynamic graphs which demonstrate the superior performance of KKrKF over competing alternatives.The tests compare the following reconstruction algorithms: (i) The least mean-squares (LMS) algorithm in [9] with step size µ LMS ; (ii) the distributed least-squares reconstruction (DLSR) algorithm [8] with step sizes µ DLSR and β DLSR (both LMS and DLSR can track slowly time-varying B-bandlimited graph signals); (iii) The B-bandlimited instantaneous estimator (BL-IE) which uses the estimator in [3], [4] per slot t; and (iv) Algorithm 1 with the following configuration: a diffusion kernel (cf.Table I) with parameter σ; a state noise covariance Σ η [t] = s η Σ η with parameter s η > 0 and Σ η := N N a positive definite matrix with N ∈ R N ×N a random matrix with standardized Gaussian entries; and a transition matrix The performance of the aforementioned approaches is evaluated in terms of the normalized mean-square error where the expectation is taken over the sample locations, and realizations of Σ η , and S c [τ ] is an (N − S[τ ]) × N matrix comprising the rows of I N whose indices are not in S[t].For all the tests, the sampling set is chosen uniformly at random without replacement over V and kept constant over time; that is S[t] = S, ∀t.
The first real dataset contains timestamped messages exchanged over an online social network between students at the University of California, Irvine [21] for a period of 90 days corresponding to 3 months.The sampling interval t is one day.A network was created where {A n,n [t]} t=30k t=30(k−1)+1 counts the number of messages exchanged between student n and n in the k-th month.The resulting topology changes across months.A subset of N = 310 users for which A[t] corresponds to a connected graph ∀t was selected.At each time t, f [t] was generated by adding a temporally uncorrelated B-bandlimited component with B = 5 and a spatio-temporally correlated component.Specifically, are standardized Gaussian distributed for all t, and {u i [t]} 5 i=1 are the eigenvectors associated with the 5 smallest eigenvalues of the Laplacian matrix at time t.Function f n [t] is therefore smooth with respect to the graph and can be interpreted e.g. as the time that the n-th student spends on the specific social network during the t-th day.
The first experiment justifies the proposed decomposition by assessing the impact of dropping each term on the right hand side of (2).Fig. 1 depicts the NMSE over the time index for KKrKF; the Kalman filter (KF) estimator, which results from setting fν [t|t] = 0 for all t in the KKrKF and therefore does not exploit spatial information, as well as kernel Kriging (KKr), which the KKrKF reduces to if fχ [t|t] = 0 for all t and therefore does not exploit temporal information.As observed, the novel algorithm, which accounts for both terms, is capable of efficiently capturing the spatial as well as the temporal dynamics over time-evolving topologies.
The second dataset is provided by the National Climatic Data Center [22], and comprises hourly temperature measuments at N = 109 measuring stations across the continental United States in 2010.A time-invariant graph was constructed as in [10], based on geographical distances.The value f n [t] represents the temperature recorded at the n-th station and t-th sample.The sampling interval is one hour for the first experiment and one day for the second.
Next, the performance of the different reconstruction algorithms is evaluated in tracking the temperature values.Fig. 2 depicts the true temperature value along with the estimates of the different algorithms for a station n that is not sampled, i.e. n / ∈ S, with S = 40.Clearly, the novel algorithm accurately tracks the temperature by exploiting spatial and temporal information.On the other hand, DLSR and LMS cannot capture the fast signal variations.Finally, Fig. 3 compares the NMSE of all considered approaches, and showcases the superior NMSE performance of Algorithm 1 for S = 40.As observed, KKrKF captures the spatio-temporal dynamics and outperforms existing alternatives.

V. CONCLUSIONS
This paper introduced an online estimator to reconstruct dynamic processes over dynamic graphs.In this context, the function to be estimated was decomposed in two parts: one capturing the spatial dynamics while being uncorrelated over time, and the other modeling jointly spatiotemporal dynamics.A novel kernel kriged Kalman filtering approach was developed that leverages Laplacian kernels for reconstructing the spatial component.The algorithm was evaluated on synthetic as well as real-data scenarios, and performed markedly better than existing alternatives.Future work includes distributed implementations, multi-kernel approaches for optimal selection of r(•), and data-driven learning of P [t].

TABLE I :
Common transformation functions.