Signal processing on kernel-based random graphs

We present the theory of sequences of random graphs and their convergence to limit objects. Sequences of random dense graphs are shown to converge to their limit objects in both their structural properties and their spectra. The limit objects are bounded symmetric functions on [0,1]2. The kernel functions define an equivalence class and thus identify collections of large random graphs who are spectrally and structurally equivalent. As the spectrum of the graph shift operator defines the graph Fourier transform (GFT), the behavior of the spectrum of the underlying graph has a great impact on the design and implementation of graph signal processing operators such as filters. The spectra of several graph limits are derived analytically and verified with numerical examples.


I. INTRODUCTION
Graphs are one of the most important tools to model and investigate the complex relations in modern networks.For example, graphs have been used to model sensor networks [1], biological networks [2], and social networks [3].Recently, much work has been done within the signal processing community towards the development of a theory of signal processing on graphs [4], [5].The objects of study in graph signal processing are real valued signals defined on the nodes of a graph.As the graph is a discrete domain, the typical notions of shifts in time and space need to be revisited [4], [3].Thus, a graph shift operator is introduced which captures the structure of the network.A fundamental assumption of graph signal processing is that the structure of the network has some explanatory power for how signals evolve on its nodes.Specifically, the eigenvectors of the shift operator define the graph Fourier basis and its eigenvalues the Fourier coefficients leading to the definition of the fundamental graph fourier transform (GFT).
As the graph structure influences how signals defined on its vertices evolve, several questions related to the structure of the graph naturally arise.Firstly, as most graph signal processing applications depend on the spectrum of the graph, how do changes in the network topology influence the spectrum of the graph?Moreover, if the graph is random, and possibly time varying, is its spectrum stable?This paper investigates a class of dense random graph models and their properties with a view towards signal processing on large and possibly time varying graphs.Specifically, the notion of convergence of sequences of random graphs is introduced [6], [7].This class of random graphs is shown to converge to a limit object [6].This limit object is a symmetric function supported on [0, 1] 2 .This symmetric function induces an operator whose spectrum coincides with that of the random graph [8].The central argument of this paper is thus two fold.
Given that it is the spectrum of the shift operator which is used to perform graphical signal processing tasks, and sequences of random graphs converge spectrally to a limit object, it is quite clear than an entire equivalence class of graphs may be treated exactly the same with respect to signal processing tasks on those graphs.And secondly, if the graph is timevarying, so long as the graph is both large enough and varies in accordance to the underlying kernel model, then the spectrum of the shift operator is stable.Moreover, bounds for how much a graph induced by a uniform random subsampling scheme can structurally differ from the larger graph have been derived.This provides a valuable insight into topology identification, where the natural question of how much one can subsample a graph and still infer its structure remains open.
Throughout this paper bold upper-case letters denote matrices, bold lower-case letters denote vectors, K is a base field, upper case letters denote constants, lower-case letters denote variables, (•) T is the transpose operator, and caligraphic letters denote graphs and their limit objects.

Consider an undirected graph
N } is the set of vertices or nodes, and and v j are connected.A graph signal x 2 R N is observed on the vertices of the graph.Several different operators have been proposed as "shift" operators.For undirected graphs, the shift operators are symmetric, and capture the graph structure in that S ij = 0 if vertices i and j are not connected, and some real number if they are connected.Examples are the scaled adjacency matrix, 1  N A, and the normalized Laplacian D 1/2 LD 1/2 , where L is the Laplacian matrix and D is the degree matrix.Since the shift operators are symmetric, they admit a full complement of orthonormal eigenvectors and associated eigenvalues, S = Q⇤Q T .The graph Fourier transform (GFT) is then defined as x = Q T x, and the inverse GFT is similarly defined as x = Qx.
One of the fundamental tasks in graph signal processing is filtering.A graph filter H is an operator which acts to attenuate or amplify the graph fourier coefficients k .The general form of a graph filter implemented in a universal fashion where H( n ) , hHx, q n i/hx, q n i. Graph filters can also be approximately implemented in a distributed fashion through a matrix polynomial approach.We can define a k-order graph finite impulse response (FIR) filter as the following The polynomial fitting problem can be stated as and rewriting (3) in matrix vector notation we obtain where h is the vector of polynomial coefficients, ĥ is the vector representation of H( n ), and is a N ⇥ K Vandermonde matrix with rows generated by n .Assuming that n are distinct, the rank of is min{N, K}, and thus depending on the dimensions N and K the solution to (4) can be given by the inverse or the pseudo-inverse.
In order to specify the desired filter response, it is clear from (1)-( 4) that knowledge of the range of eigenvalues of the shift operator is required.One can design a filter response without knowledge of the eigenvalues of S, but not with confidence that the filter will have the desired effect.For example, if one were to design a high-pass filter with a cutoff higher than the largest graph Fourier coefficient, the effect would not be that of a "high-pass" filter.
In the following section we introduce a general class of dense random graph models.It is shown that even though the graphs are random, they converge in an appropriate sense to limit objects which both determine the structure of the graph and the spectrum of the graph.These models can be estimated from an observed graph, and thus provide valuable information for the implementation of graph signal processing algorithms.

A. Kernel-based Models
Signal processing on graphs is inspired by and endeavors to work on real-world networks.Much literature has been dedicated to the modeling of real-world networks with random graph models.The internet, social networks, mutual citation, and the spread of disease can all be modeled with random graphs.However, these networks are typically not well modeled by the classical Erdös-Renyí random graphs [10], [12].Real-world networks typically have heavy-tailed degree-distributions, exhibit "small-world" phenomena, they tend to be clustered, and have neighborhood density higher than the average edge density [10], [11], [12].This has given rise to new, more general random graph models, one of which are Kernel-based models [13].These are characterized by the use of a symmetric kernel function to generate probabilities controlling the formation of edges between nodes.
A kernel random graph is a triple G(N, W, µ) where N is the number of vertices in the graph, W : [0, 1] 2 ![0, 1] is a symmetric measurable function, and µ is a random variable defined on [0, 1].To generate a graph from this triple, for each pair of vertices v i and v j a sample is drawn from µ. Then each realization of µ is treated as a coordinate and mapped to a probability by the symmetric function W. This probability is then used to perform a Bernoulli trial to establish if the two vertices are connected, in a manner analogous to the formation of an Erdös-Renyí graph.The measurable symmetric function W is called a Graphon by Lovasz and Szegedy.In this paper, we assume that µ ⇠ U[0, 1].
Conversely, the adjacency matrix of a graph G induces a graphon in the following way.For any weighted graph with vertex weights ↵ n , and edge weights ij , normalize the weights such that P n ↵ n = 1.Then, to construct a symmetric function, for each element of the adjacency matrix a ij place a square of area ↵ i ⇥ ↵ j in the corresponding position of [0, 1] 2 , in which the symmetric function will be the constant edge weight ij .This graphon is denoted W Gn .

B. Graph Sequences and Convergence
Consider a sequence of random graphs (G) n with |V (G n )| ! 1 being drawn from a kernel model G(N, W, µ).A natural question is whether this sequence converges to any particular object.If so, then while the graph in question is still random, its emergent properties would remain deterministic.The graph properties that we consider in this paper are homomorphism densities.The homomorphism density of a simple graph F with K nodes in a given graph G with N nodes is defined as where hom(F, G) is the number of adjacency preserving maps (homomorphisms) V (G) !V (F), and K N is the complete graph on N nodes.Roughly speaking t(F, G) is a ratio between the number of copies of F in G and the number of copies of F in the complete graph on N nodes.The graph homomorphism densities of different graphs F in G provide structural information about the graph.For example, if We can further use the symmetric functions W to define a limit object of the graph sequence (G n ).Let W be a bounded symmetric function W : [0, 1] 2 ![0, 1], and F be a simple graph with V (F) = {1, • • • , K}.Then the graph homomorphism limit of F in W is defined as The following two theorems, proven in [7], form the fundamental link between convergent graph sequences and graphons, and hence the link between observed graphs and explanatory graphon models.
Theorem 1.For every convergent graph sequence G n there exists a bounded measurable symmetric function W : Theorem 2. The graph sequence (G(n, W, µ)) is convergent with probability 1, and its limit object is the function W.
Thus the graphon itself can be used to study the properties of the graph sequence.As will be observed in the next section, these are not limited to structural properties of the graph, as we can use the graphon to calculate the spectra of the graphs in (G) n as n ! 1.
IV. HILBERT-SCHMIDT OPERATORS Definition 1.Let X and Y be intervals in R and W : Graphons, as defined in the previous section are clearly Hilbert-Schmidt kernel functions, being both bounded and measurable.Hilbert-Schmidt kernel functions induce bounded integral operators on the space of square integrable functions L 2 (Y ) ! L 2 (X).In the case of graphons, they induce an integral operator T : Since the graphons are required to be symmetric, the operators they induce are self-adjoint.This, combined with the fact that all Hilbert-Schmidt operators are compact [9] implies that the spectrum of the operator (7) consists of a finite number of real-valued eigenvalues.The eigenvalues of (7) and corresponding eigenfunctions can be found by solving the resolvent equation for f (x) and , given the kernel W. Since we know from Theorems 1-2 that graph sequences converge to their limit object, and that every graph sequence has a limit object, knowledge of the spectrum of (7) provides knowledge of the spectrum of the graphs in (G) n , particularly when the number of vertices becomes large.Indeed, it is not difficult to see from the induced graphons W Gn that for "unweighted" graphs, the spectrum of W is the spectrum of the normalized adjacency matrix W Gn as n ! 1. Theorem 11.53 in [12] gives a formal statement and proof of this fact.Similarly, the degree matrix is also determined by the graphon, and thus, the spectrum of the graph Laplacian can also be investigated through the resolvent equation.We restrict our attention to the normalized adjacency matrix in this paper.

V. SPECTRA OF RANDOM GRAPH MODELS
In this section we solve the resolvent equation ( 8) for several candidate graphons which can be used to describe various random graph models.

A. Erdös-Renyí
Erdös-Renyí graphs in which the edges are characterized by independent and identically distributed Bernoulli trials parameterized by an "edge" probability p.Thus, the kernel graph model G(N, W, µ) that describes this is G(N, p, µ) where µ can be any probability distribution with support on [0, 1] without changing the resulting graph sequence (G) n .Substituting this kernel into the resolvent equation ( 8) we obtain There are two possibilities regarding the integral in (9).
and any function which integrates to 0 is a corresponding eigenfunction.
From (10) it is clear that any constant function c • 1(x 1 ) is an eigenfunction corresponding to the eigenvalue = p.Requiring that kf (x)k = 1 establishes uniqueness of the eigenfunction.Thus the multiplicity of the eigenvalue = p is 1.

B. Block-Stochastic Models
The Erdös-Renyí model clearly does not allow for the formation of communities, or heterogeneous degree distributions with very high probability [12].To better model these features of real-world networks, block-stochastic models (BSM) have been proposed [14].Consider the extreme case where we wish to model two communities with average edge density p 1 and p 2 within the communities respectively, and no connections between the communities.Let ⌘ 1 and ⌘ 2 denote the proportion of nodes in each community.Then a corresponding graphon to this block stochastic model could be a constant p 1 function on the square [0, ⌘ 1 ] 2 and a constant p 2 on the square Note that this graphon is not unique: any symmetric permutation of it will be equivalent.Assume that µ ⇠ U[0, 1], then any symmetric function which has a constant value p 1 on an area equal to ⌘ 2 1 , and p 2 on an area (1 ⌘ 1 ) 2 , while 0 elsewhere will produce an equivalent graph sequence.Thus, graphons form an equivalence class.
We can build on the spectral analysis of the Erdös-Renyí model to analyze the spectrum of the Block Stochastic Model.Substituting the proposed kernel W into (8), and defining the indicator functions To analyse (11) we need to consider several cases.Define ) is an eigenfunction corresponding to the eigenvalue = p 1 •⌘ 1 .Similarly, 1 ⌘2 (x 1 ) is an eigenfunction corresponding to the eigenvalue = p 2 •⌘ 2 .Trivially, 0 is also an eigenvalue corresponding to functions in which I 1 and I 2 evaluate to 0. The question then arises of when both I 1 and I 2 are non-zero, and not necessarily equal.However, this will lead to candidate eigenfunctions which are constant on the two intervals [0, ⌘ 1 ] and [⌘ 1 , 1] with distinct eigenvalues from p 1 and p 2 .However, this leads to a contradiction as eigenfunctions of a self-adjoint operator corresponding to distinct eigenvalues must be orthogonal as Thus there are only two non-zero eigenvalues corresponding to the eigenfunctions c • 1 ⌘1 and c • 1 ⌘2 .

C. Exponential Model
Block Stochastic Models are useful for modeling realworld networks which have distinct communities with relatively homogeneous edge density within the communities and little to no connection between the communities.In some networks this model may be unspecified in that the degree distribution may be heterogeneous.In such a case where there is a continuous spread of vertex degrees throughout the network, a continuous graphon model makes sense.Consider the symmetric exponential graphon W(x 1 , x 2 ) = e ( 1 (x1+x2)+ 0 ) .Then implying that f (x 1 ) = e 1 x1 is an eigenfunction.Substituting this function into the integral in (13) one obtains )e ( 1 x1+ 0 ) (15) from which it is clear that 1 2 1 (1 e 2 1 )e 0 is the corresponding eigenvalue.Observing that R 1 0 e 1 x2 f (x 2 )dx 2 is the inner product between e 1 x2 and f (x 2 ) (where (•) denotes the adjoint of a function) in L 2 [0, 1], we can then hypothesize the existence of infinitely many eigenfunctions corresponding to = 0, as there are infinitely many functions in L 2 [0, 1] orthogonal to any given function in that space.
The exponential graphon has the benefit that the homomorphism density ( 6) is easily computable in closed form for any simple graph F, and depends only on the parameters 1 , and 0 .
where d i is the degree of node i in the graph F. To arrive at ( 16) from ( 6) note that, for the chosen graph F, the product Q (i,j)2E(F ) W(x i , x j ) contains the graphon function for nodes i and j if there is an edge between these nodes in the graph.Noting that the exponent is separable yields the constant term e |E(F )| 0 .Further noting that the derivation of ( 16) is complete.The homomorphism densities are testable graph parameters in that 8✏ > 0 9 an integer k 0 such that for all graphs G on k > k 0 nodes, a random set of vertices X in G satisfies where G[X] is the induced graph on the set X. We direct the reader to Theorem 5.1 in [11] for the proof of this statement.Thus, a random sub-set of size k 0 for graphs of size at least k which has the same homomorphism densities for every subgraph F of an observed graph G.Moreover, probabilistic bounds on the error of t(F, G[X]) as a function of the size of the sample X have been derived [12], and efficient algorithms for the testing of graph properties have been developed [15], [16].Thus, by subsampling the observed large graph G, and adjusting 0 and 1 in (16) to fit the observed homomorphism densities, an estimate of W can be obtained.

VI. SIMULATION RESULTS
Here we verify the findings of the previous section in generating large graphs from the specified kernel graph models and calculating their eigenvalues.For each kernel we generate graphs with an edge size of N = 1000, and plot the results of 100 trials in a log-histogram showing the positive eigenvalues of the normalized adjacency matrix 1 N A A. Erdös-Renyí Fig. 1 shows the positive spectrum of the normalized adjacency matrix corresponding to the kernel graph model G(1000, 0.9).It can be observed that there is one eigenvalue that is significantly different from 0 being = p = 0.9.The range of the eigenvalues around 0.9 is 0.8966 to 0.9005.

B. Block-Stochastic Models
Fig. 2 shows the positive spectrum corresponding to a Block Stochastic model in which there are two distinct communities.In the first community, which contains ⌘ 1 = 0.3 of the vertices in the network, the probability of an edge between two vertices is p 1 = 0.9.In the second community which contains ⌘ 2 = 0.7 of the vertices, the probability of edge connection is p 2 = 0.2.As was predicted in the previous section, there are two tightly located clusters of non-zero eigenvalues at approximately ⌘ 1 p 1 = 0.3 • 0.9 = 0.27 and ⌘ 2 p 2 = 0.7 • 0.2 = 0.14.The spread of the eigenvalues is wider for the community with fewer nodes, despite a much higher probability of edge formation.).As 1 = 1, the previous section's finding predict that there should be one non-zero eigenvalue 1/2(1 e 2 ) = 0.4323.We observe a tight cluster around this value in the positive spectrum.

VII. CONCLUSIONS
The theory of sequences of dense random graphs and their convergence to limit objects has been introduced and its consequences for signal processing on graphs has been explored.The spectra of several random graph models were derived from the induced integral operators of their limit objects.Numerical results show a close correspondence between the predicted eigenvalues of the graph limit and observed random graphs generated from the corresponding random graph model.Simulation results support the conclusion that random graphs of sufficient size which are generated from the same model can be treated as equivalent.