Faster Approximate Distance Queries and Compact Routing in Sparse Graphs

A distance oracle is a compact representation of the shortest distance matrix of a graph. It can be queried to approximate shortest paths between any pair of vertices. Any distance oracle that returns paths of worst-case stretch (2k-1) must require space $\Omega(n^{1 + 1/k})$ for graphs of n nodes. The hard cases that enforce this lower bound are, however, rather dense graphs with average degree \Omega(n^{1/k}). We present distance oracles that, for sparse graphs, substantially break the lower bound barrier at the expense of higher query time. For any 1 \leq \alpha \leq n, our distance oracles can return stretch 2 paths using O(m + n^2/\alpha) space and stretch 3 paths using O(m + n^2/\alpha^2) space, at the expense of O(\alpha m/n) query time. By setting appropriate values of \alpha, we get the first distance oracles that have size linear in the size of the graph, and return constant stretch paths in non-trivial query time. The query time can be further reduced to O(\alpha), by using an additional O(m \alpha) space for all our distance oracles, or at the cost of a small constant additive stretch. We use our stretch 2 distance oracle to present the first compact routing scheme with worst-case stretch 2. Any compact routing scheme with stretch less than 2 must require linear memory at some nodes even for sparse graphs; our scheme, hence, achieves the optimal stretch with non-trivial memory requirements. Moreover, supported by large-scale simulations on graphs including the AS-level Internet graph, we argue that our stretch-2 scheme would be simple and efficient to implement as a distributed compact routing protocol.


Introduction
A distance oracle is a compact representation of the shortest distance matrix of a graph.It can be queried to retrieve distances and paths (of corresponding length) between any pair of nodes in the graph.Besides their fundamental connection to the all-pair shortest path problem, there are two main applications of distance oracles.First, techniques from distance oracles have been applied in designing compact routing schemes [14], where routers must require limited memory to store forwarding tables and yet route along short paths.Second, distance oracles have been used to analyze large scale social networks [10] -efficiently computing shortest distances and paths between users in social networks is both important and non-trivial.These networks often contain millions of nodes and billions of edges, making it expensive (if not impossible) to store the shortest distance matrix on machines with limited random access memory.To make computations feasible, distance oracles with smaller size and bounded small error on distances returned are desired.Indeed, a fundamental trade-off in constructing a distance oracle is between its size and its stretch: the worst-case ratio of the distance returned by the distance oracle to the actual shortest distance between the two vertices.For general graphs, the optimal1 space/stretch trade-off was achieved by Thorup and Zwick [15]: their distance oracle, for any graph with n vertices and for any integer k ≥ 2, is of size O(kn 1+1/k ) and returns paths with stretch 2k − 1 in time O(k).However, the hard instances for the matching lower bound are rather dense graphs, with average degree Ω(n 1/k ).For instance, to prove a space lower bound of Ω(n 2 ) for stretch 2, their proof uses a graph with Θ(n 2 ) edges; for stretch 3, the proof uses a graph with Ω(n 3/2 ) edges.The lower bound essentially states that there exist graphs that are incompressible: if a certain stretch is desired, then the size of the distance oracle is lower bounded by the number of edges in the specially-constructed graph.
Thus, classic distance oracle results may be quite far from optimal for sparse graphs: graphs with low average degree ∆.The notion of sparsity is a little tricky -for stretch 2, graphs with average degree ∆ = o(n) are said to be sparse; for any integer k ≥ 2 and stretch at most 2k − 1, graphs with ∆ = o(n 1/k ) are said to be sparse.This is of key interest since real-world graphs are sparse, with degrees much closer to logarithmic than polynomial in n.For instance, letting ∆ = c log 2 n, empirically, c ≈ 0.6 for an AS-level map of the Internet [13], c ≈ 0.4 for a router-level map of the Internet [13], and c ≈ 1.34, 0.65, 1.21, 5.10, 29.9 for social networks Cyworld, Testimonial, Orkut, MySpace, and Facebook, respectively [2,7].

Our contributions
Distance oracles.This paper presents distance oracles that, for sparse graphs, substantially break the classic space/stretch trade-off barrier, albeit at the cost of increased query time.For instance, in dense graphs, retrieving distances of stretch 2 and 3 in constant time requires space Θ(n 2 ) and Θ(n 3/2 ) respectively [15]; larger query time can not help reduce space and/or stretch.We present distance oracles that have size linear in the size of the graph and return stretch 2 and 3 distances in sub-linear query time; this significantly improves upon the earlier known constructions for the case of real-world networks where average degree is logarithmic in the number of nodes.Moreover, we demonstrate that our approach allows a surprisingly large fraction of source-destination pairs to retrieve exact distances and shortest paths.
More specifically, we introduce several new distance oracles which respectively improve stretch and space in comparison with the distance oracle of Thorup and Zwick [15].Let 1 ≤ α ≤ n and k be any positive integer.Then, for weighted undirected graphs with n vertices and average degree ∆, we present distance oracles that return stretch 2 distances using space O(n∆ + n 2 /α) and stretch (4k − 1) distances using space O(n∆ + (n/α) (1+1/k) ); both these distance oracles require O(α∆) query time.The query time can be further reduced to O(α) using an additional O(nα∆) space or a small constant additive factor (see Table 1 in §2).
For example, for the realistic case of ∆ = Θ(polylog(n)), special cases of our two results yield schemes for retrieving stretch 2 distances using space Õ(n 3/2 ), and stretch 3 distances using space Õ(n), at the expense of Õ( n) query time.Out of theoretical interest, we note that our distance oracles highlight the fact that in the regime of sparse graphs, for any fixed stretch, there may be an infinite number of pareto optimal design points -one can smoothly trade-off the query time to reduce the space requirements; in contrast, for dense graphs, there is exactly one optimal design point for any fixed stretch.
Compact routing.Thorup and Zwick [14] designed compact routing schemes for their distance oracles.Their scheme requires Õ( n) memory at each node in the network and routes along paths that have stretch 3.No compact routing schemes are known for stretch less than 3 for general graphs; in fact, it is known that even for extremely sparse graphs, any compact routing scheme that route along paths of stretch less than 2 must use Ω(n) memory at some nodes in the network [5].Hence, all we can hope for is compact routing schemes with stretch 2 and larger.
For graphs with average degree ∆ = o(n), we present the first compact routing scheme with the optimal stretch.The scheme requires O(α∆ + n/α) memory at each router and route along paths of worst-case stretch 2.Besides being the first compact routing scheme (for general graphs) with provable optimal stretch, our compact routing scheme has a particular property: it can be implemented on top of any implementation of Thorup-Zwick scheme using a handshaking scheme -a surprisingly lightweight end-to-end exchange of a very few packets -and a small amount of processing to set up a new end-to-end connection with worstcase stretch 2. Using a distributed protocol [11] to construct Thorup-Zwick compact routing scheme (with appropriately setting the parameters), we get a distributed name-independent compact routing scheme for our distance oracles with roughly the same space requirements.
In summary, our results represent a step towards characterizing the space/stretch/time trade-off for approximate distance queries in sparse graphs, and yield a simple, practical way to improve stretch in compact routing protocols.We complement our theoretical results with extensive simulations on empirical networks.Interestingly, we find that in the Internet AS-level topology, our stretch-2 scheme finds shortest paths for 99.98% of the source-destination pairs -compared with 34.4% using [15].
Roadmap.We start our discussion with related work ( §2) and setting up the notation used in the paper ( §3).In §4, we give an overview of the main techniques used in design of our distance oracles.One of the challenges in designing distance oracles with the minimalistic assumption on number of edges in the graph is to handle skewed degree distribution; in §5, we show how to handle this challenge.In particular, we prove that in the context of designing distance oracles, average-degree-bounded graphs are no harder than maximum-degree-bounded graphs; this allows us to restrict our attention to maximum-degree-bounded graphs in the rest of the paper.We present distance oracles for stretch 2 in §6 and for stretch 3 and higher in §7.We show how to improve the space/query time trade-off in our distance oracles using a small (constant) additive stretch in §8.In §9, we design compact routing schemes for our distance oracles and describe how to implement our schemes in a distributed fashion.We evaluate the performance of our distance oracles and compact routing schemes on a number of synthetic and real-world graph datasets; the evaluation results and analysis is presented in §10.Finally, we close the paper by outlining a number of open problems in §11.
Bibliographic note.This paper extends and improves upon the results presented in the conference version of this paper [1].In particular, to bound the worst-case stretch, this version presents conditions (Lemma 1) that are tighter than those used in the conference version, and use the new conditions to improve the query time without increasing the size of the distance oracles (updated §6 and §7).The improvement in query time also leads to improved compact routing schemes (updated §9).In addition, this version also presents distance oracles for additive stretch (new §8).Finally, we have significantly simplified the presentation and proofs when compared to the conference version (essentially re-written §6 and §7).

Related Work
In this section, we discuss the known lower and upper bounds for the space/stretch trade-off in the approximate distance query problem for the regime of sparse graphs.
Lower bounds.For general graphs, Thorup and Zwick [15] showed (subject to a conjecture of Erdős) that achieving (integer) stretch (2k − 1) requires Ω(kn 1+1/k ) space.Their proof is information-theoretic, essentially showing that for any constant stretch, there exist graphs that require storing as many bits as the number of edges in the graph.For example, proving that stretch 2 requires Ω(n 2 ) space uses a graph with Θ(n 2 ) edges; for stretch 3, the proof uses a graph with Θ(n 3/2 ) edges.
There is no hope of this proof technique being helpful in the sparse case.In particular, for graphs with m = n∆ edges, this technique will only show that achieving any constant stretch value requires Ω(n∆) bits.This much space is entirely acceptable for sparse graphs, and in fact, can permit retrieval of shortest paths, simply by storing the original graph and running Dijkstra's algorithm for each query.Of course, this takes time O(n∆) per query.Thus, in the context of distance oracles with super-constant query time, the cases of dense and sparse graphs are quite different.In the dense case the key is to compress the graph while ensuring that sufficient information remains to return low-stretch distances.In the sparse case the graph need not be compressed, but the trade-off with query time becomes critical.
Very little is known about this trade-off space for sparse graphs.First, Sommer et al. [12] show that any distance oracle that returns stretch t paths in time α requires space n 1+Ω(1/αt) .For distance oracles with constant query time, this gives a lower bound of space n 1+Ω(1/t) for any stretch t.However, if we allow Ω(log n) query time, their result implies a trivial lower bound of Ω(n log n) for any constant stretch.Second, Pǎtraşcu and Roditty [9] prove that if a widely believed conjecture about the hardness of set intersection queries holds, then retrieving stretch 2 paths in constant time requires a distance oracle of size Ω(n n∆).For the case of Ω(log n) query time, as in our schemes, no non-trivial lower bounds are known.
For stretch 3 and larger, the lower bounds for distance oracles also hold for compact routing schemes [14]; consequently, these are tight only for dense graphs.It is shown in [4][5][6] that any compact routing scheme with stretch less than 2 must require Ω(n log n) memory at some nodes in the network -this bound holds even for extremely sparse graphs.The compact routing scheme for our distance oracle, hence, achieves the optimal stretch with non-trivial memory requirements at each node.

Upper bounds.
A detailed comparison of our results with previously known upper bounds on distance oracles for general graphs is presented in Table 1.Very recently, Pǎtraşcu and Roditty [9] obtained a distance oracle that returns stretch 2 paths in constant time with O(∆ 1/3 n 5/3 ) space.These queries are faster than our stretch-2 scheme, but the distance oracle has larger size for α > (n/∆) 1/3 .For general sparse graphs, no other results are known.
For unweighted graphs, the only known o(n 2 ) size distance oracle with approximation ratio 2 is again due to the recent result of Pǎtraşcu and Roditty [9].Their distance oracle requires space O(n 5/3 ) and returns approximate distance in constant time.As earlier, our distance oracle requires lesser space but higher query time when compared to their distance oracle.
In terms of upper bounds for compact routing schemes, we note that the only known results are by Thorup and Zwick [14] for stretch 3.No compact routing schemes with worst-case stretch less than 3 are known.Although we believe that it may be possible to design compact routing schemes for the distance oracle of Pǎtraşcu and Roditty [9], it is not clear whether this can be done in a distributed fashion.Our compact routing schemes, on the other hand, can be constructed in a distributed fashion and have worst-case stretch bounded by 2.

Notations and Definitions
Throughout the paper, we let G = (V, E) be a connected, undirected graph with n = |V | nodes and m = |E| edges.Unless mentioned otherwise, G is assumed to be weighted with each edge assigned a non-negative weight.
For any node v ∈ V , we denote by N (v) the set of all the neighbors of v.For any set V ′ ⊂ V , we denote by N (V ′ ) the set of all the neighbors of nodes in V ′ .We let deg(v) denote the number of neighbors of node v, that is, deg(v) = |N (v)|.The graph is said to be maximum-degree-bounded (or, ∆-degree bounded) if for all nodes v ∈ V , deg(v) ≤ ∆.We say that the graph is average-degree-bounded graph (or, has average degree ∆) if 2m/n ≤ ∆.
For any pair of nodes u, v ∈ V , let d(u, v) be the length of the shortest path between u and v in G and let δ(u, v) be the length of the path returned by the distance oracle.The distance oracle is said to return stretch t paths if for every pair of nodes u We will let L ⊂ V denote a distinguished set of "landmark" nodes chosen by our algorithms.For any node v ∈ V , we denote by ℓ(v) the nearest neighbor of v in L (i.e., the node a ∈ L that minimizes d(v, a), with ties broken arbitrarily).The ball of v, B(v), is the set of nodes w ∈ V for which d(v, w)

Overview of our schemes
Our distance oracle for stretch 2 is conceptually similar to the stretch 3 distance oracle of Thorup and Zwick [15].For a given graph, they construct a set of nodes, known as landmarks, such that each node has a landmark in its ball.The distance oracle stores, for each node, the distance to each node in its ball and to its closest landmark; the landmarks store distances to all nodes in the graph.When queried for distance between nodes u and v, the query algorithm checks if v is in ball of u.If it is, then the exact distance is returned using information stored in the distance oracle; if not, the distance d(u, ℓ(u)) + d(ℓ(u), v) is returned, which by triangle inequality is at most of stretch 3.
Intuitively, the cases that attain worst-case stretch in their distance oracle are the ones for which the destination v is just outside the ball of the source u.For such source-destination pairs, we exploit the idea of ball-vicinity intersection 2 .Upon receiving a query, we search for nodes in B(u) ∩ Γ(v).Finding such nodes takes some time; but if any such node w exists, we can return the distance d(u, w) + d(w, v) using information stored in the distance oracle.If B(u) ∩ Γ(v) = , the nodes must be relatively distant, giving us a lower bound on the exact distance between u and v. Using this lower bound, we show that a path via the landmark node has stretch 2. We need to store the vicinities of the nodes for some of our distance oracles; but if the graph is sparse, we show that this does not increase the space requirements significantly.
The above distance oracle is of large space since it requires storing (a) shortest paths from the landmarks to all other nodes; and (b) the vicinities of every node.To avoid the first requirement, our distance oracles for stretch 3 and larger store the exact distances only between all pairs of landmarks.This uses significantly less space; for instance, in a graph with n nodes, storing shortest paths between every pair of n landmarks requires space at most linear in the size of the graph.To overcome the second requirement, our distance oracles computes the vicinities on the fly during a query; we show that for sparse graphs, this can be done in sublinear time.If the vicinities of u and v intersect, the exact distance is returned.If not, a low stretch path can be retrieved by concatenating the paths from u to ℓ(u), from ℓ(u) to ℓ(v), and finally ℓ(v) to v.This scheme can be generalized to further reduce space at the expense of increased stretch: rather than storing shortest paths between landmarks, we approximate these distances with the schemes of [15].

Average-degree Bounded Graphs are no harder than Maximum-degree Bounded Graphs
Our distance oracles achieve improved space/stretch trade-off with the minimalistic assumption on graph sparsity, that is, the total number of edges in the graph.One of the challenges in designing distance oracles with such minimalistic assumption is to handle skewed degree distribution of nodes in the graph.In this section, we show that in the context of designing distance oracles, average-degree-bounded graphs are no harder than maximum-degree-bounded graphs.
In particular, assume that we have a distance oracle that is of size O(S) and returns stretch-s paths in O(T ) time for any ∆-degree bounded graph on n nodes, where S and T are functions of n, ∆ and s.For any fixed stretch s and fixed ∆, we require , which is true for all functions S and T of interest since S = O(n 2 ) and T = O(n 2 ) for any non-trivial distance oracle.We show that can be used to build a distance oracle of size O(S) that returns stretch-s paths on a graph with average degree ∆ in at most O(T ) time.
Let G = (V, E) be a connected graph with average degree ∆.Given G, we will first create a ∆-degree bounded graph G ∆ = (V ∆ , E ∆ ).Then, we show how can be used on G ∆ to return stretch-s paths on G.
Finally, for each pair v i , v i+1 , we create an edge in E ∆ of weight 0.
In order to answer an approximate distance query for any pair of nodes u, v ∈ V , we use to answer approximate distance queries between u 1 , v 1 ∈ V ∆ in G ∆ and let the length of the path returned by the data structure be δ ′ .We output the distance δ ′ as an approximate distance for the pair of nodes in G.
State and Query Time.We first prove that asymptotically, the size of the data structure and the query time are not increased due to the reduction.Fix some stretch s.
; all we need to show is that the number of nodes in G ∆ are within a constant factor of the number of nodes in G.
Proof: The degree boundedness is trivial from the construction.We prove the claim regarding number of nodes.The reduction implies that In §8, we discuss how this reduction can be intuitively interpreted when it is incorporated into the algorithm of the next section, so that the algorithm runs directly on G rather than G ∆ .In the rest of the paper, we restrict our attention to ∆-degree bounded graphs only.

Distance oracle for stretch 2
In this section, we present our distance oracles that return distances and paths of worst-case stretch 2. Throughout the section, we assume that the graph is a ∆-degree bounded graph; the discussion in §5 then immediately gives us distance oracles for graphs with average degree ∆.For any fixed 1 ≤ α ≤ n, our first distance oracle has size O(n∆ + n 2 /α) and returns stretch-2 distances in O(α∆) time.In §6.4,we show how to further reduce the query time to O(α) using an additional O(n∆α) space.Our oracles, similar to the oracles in [9,15], also allow retrieving paths in constant time per hop.

Constructing the distance oracle
Let G = (V, E) be a ∆-degree bounded graph.The construction begins by sampling each node independently at random with probability 1/α, creating a set L of sampled "landmark" nodes.The distance oracle stores: • For each node v ∈ V , a hash table containing its neighbors N (v).
• For each node v ∈ L, a hash table containing the shortest distance to every other node in G.
• For each node v ∈ V \L, ℓ(v) and the "ball radius" This completes the preprocessing of the graph and construction of the distance oracle.We start by proving the bound on the size of the distance oracle.Note that our construction of the distance oracle is randomized; the construction gives us a distance oracle that has size O(n∆ + n 2 /α) in expectation.Using a Chernoff bound, all our results hold with high probability with a logarithmic factor larger size.However, as discussed below, our query algorithm is deterministic; that is, it never outputs distances with stretch more than 2. Hence, a distance oracle with worst-case size O(n∆ + n 2 /α) can be constructed using a Las Vegas algorithm; see [15] for details.

Answering distance queries
The algorithm QUERY-2(u, v) to approximate the distance between nodes u and v is shown in Algorithm 1. Suppose the query asks for distance between nodes u and v.The algorithms starts by running a shortest path algorithm that stops once u and v have computed their vicinities; such an algorithm, a modified version of the one presented in [15] for instance, takes time O(α∆).This can be done since the graph is stored in the distance oracle (in the form of neighbors for each node).Both u and v store their vicinities and the distances to each node in their vicinity in a hash table temporarily.
The algorithm then checks whether v ∈ Γ(u) or u ∈ Γ(v), in which case it directly reads d(u, v) from the hash table maintained at u or v respectively.If v / ∈ Γ(u) and u / ∈ Γ(v), the algorithm performs a ball-vicinity intersection check: it queries each of the nodes w ∈ B(u) and checks if w ∈ Γ(v).If at least one such w is found, it returns the minimum of d(u, w) + d(w, v) over all such w.If there is no such w, the algorithm queries u and v for their ball radii r u and r v .If r u < r v , the algorithm returns d(u, ℓ(u)) Algorithm 1 QUERY-2(u, v) -answering approximate distance queries with stretch-2.

Analysis
We now analyze the query time and stretch for the above distance oracle and the query algorithm.
Query time.We start by analyzing the worst-case query time for the query algorithm.Using a standard argument as in [15, Lemma 3.2], for any node v we have that E[|B(v)|] = O(α).Since the graph is ∆degree bounded for some fixed constant ∆, we have that the vicinity size is bounded in expectation as follows: To start with, the query algorithm requires constructing hash tables containing vicinities of the source and the destination.We note that computing the balls for the source and the destination using any of the standard shortest path algorithms takes time O(α∆) for a ∆-degree bounded graph.Since the neighbors of each node are stored in the distance oracle, creating a hash table containing the vicinity takes an additional O(α∆) time.In the next step, the query algorithm checks for the ball-vicinity intersection; this takes an additional O(α) time -for each element in the ball of u, it takes O(1) time to check whether it is contained in the hash table containing the vicinity of v. Hence, the total query time is bounded by O(α∆).

Stretch
We obtain an upper bound of 2 on the stretch of the distance between the nodes returned by QUERY-2(u, v).The proof uses the following lemma.

Lemma 1 (Ball-vicinity intersection lemma). For any pair of nodes
Since v i 0 lies on the shortest path along between u and v, we get that The above lemma shows that if two nodes u and v are "close", there must exist some node which lies in the vicinity of both u and v.A word of caution though -the above lemma has to be interpreted in the right way -two nodes having a ball-vicinity intersection can be significantly far away.This is because the nodes in the vicinity of any node u are not necessarily the O(α∆) closest nodes of u; while balls possess this structure, vicinities contain the neighbors of the nodes in the balls, thereby destroying any meaningful "distance based" interpretation of the vicinities.We now prove the bound on stretch.

returns the exact distance between u and v.
Proof: If d(u, v) < r u + r v , we have using Lemma 1 that there must be at least one node x ∈ B(u), such that x ∈ B(u) ∩ Γ(v).The algorithm reads the "exact" distance d(u, x) from the hash-table maintained at node u and the "exact" distance d(v, x) from the hash-table maintained at node v. From the proof of the ball-vicinity intersection lemma, we note that among all such nodes x ∈ B(u) ∩ Γ(v), there must be at least one node which lies on the shortest path between u and v, and this node minimizes the distance returned by the algorithm resulting in stretch 1.
For the case when d(u, v) ≥ r u + r v , we show that our scheme results in a stretch at most 2.

Theorem 1. For any pair of nodes u
Proof: Without loss of generality, assume that r u ≤ r v .Then, the condition in the lemma implies that By the triangle inequality, we have that Using the lower bound of 2 • r u on the distance between u and v, we get the desired bound of 2 on stretch.

Storage versus computation
The distance oracle presented above allows one to smoothly trade-off query time to reduce the size of the distance oracle (by varying α).In particular, this gives us distance oracles that require space linear in the size of the graph (by setting α = n 2 /m), which may be of independent interest.Moreover, for specific values of α, one can also avoid running shortest path algorithm by using no additional space.More specifically, for 1 ≤ α ≤ n, one can store the balls of each node within the distance oracle at no additional cost.For these values of α, one can also design efficient compact routing schemes; we present such schemes in §9.
However, for other values of α and/or for even slightly dense graphs, there are two potential issues with the above distance oracle.First, the query algorithm requires running a (restricted) shortest path algorithm for each distance query; and second, for some fixed space, the query time can be high.For instance, for graphs with ∆ = Θ(n 1/4 ), the distance oracle with O(n 7/4 ) space requires O(n 1/2 ) query time per query for stretch 2 distances.
In this subsection, we show how to reduce the query time for our distance oracle; for specific values of ∆ and α, this may increase the size of the distance oracle.In particular, we show how to reduce the query time to O(α) (reduce it by a factor of ∆ when compared to the distance oracle above) using an additional O(nα∆) space.For the instance discussed above, for example, we will still be able to retrieve stretch 2 distances using O(n 7/4 ) space, using only O(n 1/4 ) query time.
In order to reduce the query time, we note that the above distance oracle requires computing vicinities for the nodes on the fly, leading to O(α∆) query time.Hence, in our new distance oracle, we preprocess the graph and store the vicinities within the distance oracle.Storing the vicinities, however, require O(α∆) space corresponding to each node and hence, the size of the new distance oracle is O(nα∆ + n 2 /α).The reduction in query times comes due to the fact that if the vicinities are already stored within the distance oracle, checking for ball-vicinity intersection takes O(α) time -for each element in the ball of u, it takes O(1) time to check whether it is contained in the hash table containing the vicinity of v. Since we do not require running any shortest path algorithm, the query time of the algorithm is bounded by the time taken to check for ball-vicinity intersection and hence, is reduced to O(α).
We remark that the two distance oracles presented above may achieve different set of operating points within the space/query time trade-off.For example, as discussed above, for graphs with ∆ = Θ(n 1/4 ), it is not possible to retrieve stretch 2 paths using O(n 7/4 ) space and o(n 1/2 ) query time using the first distance oracles; the second distance oracle does achieve these new operating points.On the other hand, it is not possible to achieve linear space using the second distance oracle.Nevertheless, the second distance oraclebesides having a lower query time -has the additional benefit that the query algorithm does not run shortest path algorithm for each query.

Discussion
Implications of the average-to-max-degree-bound reduction The results in this section combined with the reduction of §5 immediately give us distance oracles of size O(n∆+n 2 /α) and O(n∆α+n 2 /α), which for any graph with at most O(n∆) edges, returns stretch-2 paths in O(α∆) and O(α) time, respectively.We show how to incorporate the reduction into the algorithm in a way that yields intuition and eases implementation.
Specifically, let G be the graph with average degree ∆.The reduction implies that each node v in G which has degree deg(v) > ∆ effectively "emulates" ⌈deg(v)/∆⌉ nodes in G ∆ .Now consider constructing the distance oracle presented in this section.While sampling nodes for the landmark set L, the node v is now sampled with probability 1/α • ⌈deg(v)/∆⌉, that is, with probability that is proportional to the degree of v.Moreover, due to Claim 1, the size of B(v) remains unchanged asymptotically (B(v) and hence Γ(v) may change, but not their size).Thus, the implications of the reduction are simple: just sample each node v in the graph with probability 1/α • ⌈deg(v)/∆⌉ rather than probability 1/α.In other words, rather than sampling nodes uniform-randomly, they are sampled with probability proportional to their degree.
An optimization Although the worst-case stretch for our distance oracle is 2, we can apply simple heuristics to improve the stretch in practice.Recall that the worst-case stretch in our distance oracle occurs for source-destination pairs u, v for which B(u) ∩ Γ(v) = ; the query may return a path, for instance, u ℓ(u) v that is of stretch 2. The main observation is that for such source-destination pairs, there may exist a w ∈ Γ(u) for which the length of the path u w ℓ(w) v is less than the path u ℓ(u) v.The approximate distance query can then be answered by the distance oracle as the minimum of the distances retrieved by checking all w ∈ Γ(u) (see §9 for implementation details).Since checking the length of the paths u w ℓ(w) v for all w ∈ Γ(u) takes (asymptotically) the same time as checking the ball-vicinity intersection, the heuristic does not increase the query time, with potential improvements in stretch of retrieved paths.Indeed, we show in §10 that this heuristic increases the number of source-destination pairs that retrieve shortest paths by almost 25%.

Distance oracles for stretch 3 and larger
In this section, we present distance oracles that return paths of worst-case stretch (4k − 1), for any positive integer k.For any fixed 1 ≤ α ≤ n, the distance oracle is of size O(n∆ + (n/α) (1+1/k) ) and returns stretch-(4k − 1) distances in O(α∆) time for any graph with O(n∆) edges; the paths can then be retrieved in constant time per hop.In particular, we get a distance oracle of size O(n∆ + n 2 /α 2 ) that returns stretch 3 distances in O(α∆) time.As earlier, the query time can be further reduced to O(α) using an extra O(nα∆) space and while we present bounds that hold in expectation, our distance oracles can be derandomized using a Las Vegas algorithm.

Constructing the distance oracle
Let G = (V, E) be a ∆-degree bounded graph.Fix some 1 ≤ α ≤ n and some integer k > 0. Our construction of distance oracle begins by sampling each node independently at random with probability 1/α, creating a set L of sampled nodes.We now create a complete graph G ′ with nodes in L as the node set and for each pair l 1 , l 2 ∈ L, the weight of the edge (l 1 , l 2 ) being the shortest path between l 1 and l 2 in G.We run the approximate distance oracle from Thorup and Zwick on G ′ to construct a distance oracle ′ that stores (2k − 1)-approximate shortest paths between every pair of nodes in L. stores ′ as a sub-data structure.Furthermore, also stores, for each node v ∈ V , its set of neighbors N (v), its closest landmark node ℓ(v) and the ball radius r v .Claim 4. The size of the distance oracle is O n∆ + (n/α) (1+1/k) , in expectation.

Proof: Note that E[|L|]
= O(n/α) and hence, using the results in [15], we get that the size of the distance oracle ′ is O((n/α) (1+1/k) ).Furthermore, storing N (v) for each node v requires an additional O(n∆) space; storing ℓ(v) and r v require an additional O(1) space.Hence, the size of the distance oracle is O(n∆ + (n/α) (1+1/k) ), in expectation.

Answering distance queries
Let QUERYTZ(u, v) be the query algorithm for the Thorup-Zwick scheme [15] that returns (2k−1)-approximate distances between nodes u and v.The query algorithm for our distance oracle is shown in Algorithm 2.
Suppose the query asks for distance between nodes u, v ∈ V .The algorithm, as in Algorithm 1, starts by running a shortest path algorithm that stops when once the two nodes u and v have computed their vicinities and shortest distances to nodes in their vicinities.This can be done since the graph is stored in the distance oracle (in the form of an adjacency list) and requires O(α∆) time using a modified version of the algorithm presented in [15].Both u and v temporarily store this information in a hash table.
If v ∈ Γ(u) or u ∈ Γ(v), the algorithm returns the exact distance d(u, v) from the hash table at u or v, respectively.If v / ∈ Γ(u) and u / ∈ Γ(v), the algorithm checks for ball-vicinity intersection, that is, for each node w ∈ B(u), the algorithm checks if w ∈ Γ(v).If at least one such w is found, the algorithm returns the minimum of d(u, w) + d(v, w) over all such w.If no such w exists, the algorithm returns d(u, ℓ(u)) + QUERYTZ(u, v) + d(v, ℓ(v)).Finally, the hash tables are deleted from nodes u and v.

Analysis
In terms of the query time, we note that the above query algorithms is very similar to the query algorithm for our distance oracles with stretch 2; indeed, the only difference is the final step of the query algorithm, that is, when B(u) ∩ Γ(v) = .Since checking ball-vicinity intersection is the bottleneck in terms of query time, we get, using arguments similar to those in §6, that the query time for the above query algorithm is O(α∆).Moreover, since the definition of balls and vicinities for the above distance oracle are exactly the same as those in §6, using exactly the same proofs as in §6, we get the following claims:

Claim 6. For any pair of nodes u and v, if d(u, v) < r u + r v , QUERY(u, v) returns the exact distance between u and v.
When d(u, v) ≥ r u + r v , we have two cases similar to those in the proof of Theorem 1.For the first case, when B(u) ∩ Γ(v) = , we get the following claim the proof of which follows from the proof of Theorem 1:

Claim 7. For any pair of nodes u and v, if d(u, v) ≥ r u + r v and B(u)∩Γ(v) = , QUERY(u, v) returns a distance of stretch at most 2 between u and v.
The only remaining case is when d(u, v) ≥ r u + r v and B(u) ∩ Γ(v) = ; for this case, we prove a worst-case stretch bound of (4k − 1): the algorithm QUERY(u, v) returns, in the worst case, distance estimate of stretch-(4k − 1) between u and v.
which we set out to prove.

Discussion
We close the section with remarks along the lines of §6.4 and §6.5.First, using ideas similar to those in §6.4, the query time for the above distance oracle can be reduced to O(α) by storing the vicinities within the distance oracle -this requires an additional O(nα∆) space.This allows us to achieve new points within the space/query time trade-off although it is no more possible to have size linear in the size of the graph.
Next, although we assumed that the input graph is ∆-degree bounded, results from §5 imply that the results generalize to graphs with average degree ∆.As in §6.5, we only require to sample nodes for inclusion in the landmark set with a probability proportional to the degree of the node rather than sampling them uniform randomly.
Finally, an optimization similar to that in §6.5 is again possible for the algorithm described in this section.While checking for ball-vicinity intersection, when u queries each of the nodes w ∈ B(u), it could actually query for its distance to ℓ(w) and combine this with d(ℓ(w), ℓ(v)) and d(v, ℓ(v)) for a potentially better estimate of the distance between u and v. Again, the approximate distance query can then be answered by the distance oracle as the minimum of all the distances retrieved by querying the nodes in the ball of the source resulting in improved stretch in practice without any asymptotic increase in the query time.

Distance oracles with additive stretch
In this section, we show that the space/query time trade-off in our distance oracles from §6 and §7 can be further improved at the cost of a small additive stretch.In particular, let G be an unweighted graph and let u, v be a pair of nodes at distance d; then, for any fixed 1 ≤ α ≤ n, we design: The results can be generalized to weighted graphs without any increase in space or query time.As earlier, while the bounds presented hold in expectation, our construction algorithm can be derandomized using a Las Vegas algorithm.
The main observation that allows us to design these distance oracles is captured in the following lemma, which presents a lower bound on distance between the source and the destination when the query algorithm, rather than checking for ball-vicinity intersection as in Lemma 1, checks only for ball-ball intersection:

Lemma 2 (Ball-ball intersection). For any pair of nodes u, v ∈ V , let w uv be the weight of the heaviest edge along the shortest path between u and v. If B(u) ∩ B(v) = , the distance between u and v is lower bounded as
Proof: Assume that B(u) ∩ B(v) = and let P = (u, x 1 , x 2 , . . ., v) be the shortest path between u and v. Let i 0 = max{i|x i ∈ P ∩ B(u)}, w = x i 0 and w ′ = x i 0 +1 .By definition, w ′ / ∈ B(u) and hence, d(u, w ′ ) ≥ r u .Furthermore, since w and w ′ are neighbors and w ∈ B(u), we have that d(u, w) ≥ r u − w uv .Furthermore, since B(u) ∩ B(v) = , w / ∈ B(v) leading to the fact that d(v, w) ≥ r v ; since w is on the shortest path between u and v, we have that Lemma 2 suggests that if the query algorithms from §6 and §7 were to perform ball-ball intersection check rather than ball-vicinity intersection check, the loss in stretch can be bounded by a constant factor that depends on the heaviest weight along the shortest path between the source and the destination.In contrast to the distance oracles of §6 and §7, performing ball-ball intersection neither requires storing the vicinities nor computing them on the fly; query is now performed only on the balls of each node leading to improvements in space and/or query time.

Constructing the distance oracles
Let G = (V, E) be a ∆-degree bounded graph.The construction begins by sampling each node independently at random with probability 1/α, creating a set L of sampled "landmark" nodes.
Distance oracle for additive stretch 1.The distance oracle is similar to the one in §6.4.It stores, for each node v ∈ L, a hash table containing the shortest distance to every other node in G and for each node v ∈ V \L, distances to nodes in its ball, its landmark node ℓ(v) and the "ball radius" To bound the size of the distance oracle, we note that we have O(n/α) landmarks, in expectation, requiring O(n 2 /α) space to store distances to each other node in the graph.Furthermore, each node has O(α) nodes in its ball and hence, storing distances to these nodes require O(nα) space; storing ℓ(v) and r v for each node v requires an additional O(1) space.Hence, the total space requirements are O(nα + n 2 /α), in expectation.
Distance oracle for additive stretch 2k.The distance oracle is again very similar to the one in §7.First, a complete graph on nodes in L is computed, where weight of each edge is equal to the shortest distance between the two nodes.The distance oracle stores, as a sub-data structure, the Thorup-Zwick distance oracle ′ that returns stretch (2k − 1) distances for the complete graph over nodes in L. In addition, stores, for each node v ∈ V \L, distances to nodes in its ball, its landmark node ℓ(v) and the "ball radius" Recall that the expected number of nodes in the landmark set is O(n/α) and hence, size of the sub-data structure ′ is O((n/α) (1+1/k) ).Furthermore, since the size of ball for each node is O(α), the additional space required is O(α) for each node.The overall size of the distance oracle is, hence, O(nα + (n/α) (1+1/k) ), in expectation.

Query algorithms and analysis
The query algorithms for the above distance oracles are similar to their respective query algorithms from §6 and §7 with the only change that it performs ball-ball intersection check rather than ball-vicinity intersection check.Regarding the query time, we note that since balls for each node are stored within the distance oracles, checking for ball-ball intersection requires O(α) time, leading to the claimed bound on the query time (all other operations require constant time).
We prove the stretch bound for the first distance oracle; for larger stretch, the proof follows using simple modifications.

Theorem 3. For any two nodes u, v ∈ V at distance d, let w uv be the weight of the heaviest edge along the shortest path between u and v. Then, the query algorithm returns a distance of at most 2d + w uv .
Proof: For the case when d(u, v) < r u + r v − w uv , using Lemma 2, it is easy to show that the query algorithm returns the exact distance between u and v.
Consider the case when d(u, v) ≥ r u + r v − w uv and without loss of generality, assume that r u ≤ r v .Then, the condition implies that d(u, v) ≥ 2 • r u − w uv .In such a case, the distance returned by the query algorithm is d(u, ℓ(u)) + d(ℓ(u), v).By the triangle inequality, we have that Using the lower bound of 2 • r u − w uv on the distance between u and v, we get the desired bound of 2d(u, v) + w uv on stretch.
Similarly, one can prove that for any pair of nodes u, v at distance d, the second distance oracle returns a distance of at most (4k − 1)d + 2k • w uv , where w uv is the weight of the heaviest weight along the shortest path between u and v.

Compact routing schemes
Work on compact routing has applied the traditional results from approximate distance oracles [15] to network routing problems [14] in order to route the packets along short paths while using little memory at routers.These solutions have been proposed as centralized algorithms [14] and more recently as distributed protocols for wireless sensor networks [8], the Internet [11] and peer-to-peer networks [3].In this section, we present compact routing schemes for our distance oracles; by exploiting graph sparsity, our schemes significantly improve the memory/stretch trade-off from previously known results.In particular, we discuss a surprisingly lightweight scheme that can be incorporated in distributed routing protocol implementations of the Thorup-Zwick (TZ) scheme, [11] for instance, to get a distributed routing protocol for our distance oracles.
We primarily focus on designing compact routing schemes for stretch 2. Recall that our distance oracle for stretch 2 has size O(n∆α+ n 2 /α); our scheme distributes the state uniformly across all routers, requiring each router to store O(∆α + n/α) entries.Using α = n/∆, our scheme requires each router to store O( n∆) entries, while routing along paths of stretch 2. For graphs with ∆ = o(n), this gives us the first scheme that routes along paths of stretch less than 3 and requires sublinear state at routers in the network.In fact, for real-world networks, that is networks with ∆ = Θ(polylog(n)), our compact routing scheme requires the same amount of memory as [3,8,11,14] but routes along paths that have worst-case stretch bounded by 2. Note that any routing scheme with stretch less than 2 must require linear state at some node in the network [6] even for extremely sparse graphs; our scheme, hence, achieves the optimal stretch with non-trivial memory requirements at routers.
In addition, by setting α = n in results from §8, we get a compact routing scheme that, for any sourcedestination pair at distance d, routes along paths of length at most 2d + 1 by using O( n) memory at each router -independent of the density of the graph.TZ scheme and our distance oracle.Our distance oracle can be incorporated into the the proposed distributed adaptations [3,8,11] of the TZ scheme with minimal changes.This is due to the fact that the construction in our distance oracle, in concept, is similar to the TZ scheme: both schemes construct a set L of nodes and each node v stores a corresponding nearest neighbor ℓ(v) and certain nodes in its neighborhood.The first difference between our distance oracle and the TZ scheme is that the set L is sampled proportional to node degree rather than uniform-randomly.Second, our distance oracle differs from TZ scheme in terms of the information stored in the distance oracle: for any node v, while TZ only requires storing the ball B(v), our distance oracles stores Γ(v).Both modifications are easy changes to the distributed protocols of [3,8,11]; note that computing Γ(v) requires only neighbors of nodes in B(v).Third, to route from the source u to the destination v, our distance oracle allows u to set up an initial connection to v by using the TZ algorithm for routing between u and v.This initial connection gives a path of stretch 3, via an essentially unmodified proof of [14,15].The final task is to improve the stretch from 3 to 2.
Implementing ball-vicinity intersection.In order to improve the stretch from 3 to 2, our distance oracle requires the source and the destination to perform a ball-vicinity intersection (see Lemma 1).We show how vicinity intersection can be implemented in practice with a surprisingly lightweight handshaking scheme; that is, exchange of very few bytes between the source and the destination.Recall, from the discussion above, that the initial connection gives the source a path to the destination with stretch 3. The source can then send the list of nodes in its ball to the destination using this path.For the router-level map of the Internet measured by CAIDA [13], which consists of n = 192, 244 routers and has average degree ∆ ≃ 0.4 log 2 n, this requires the source to transfer roughly 4 • n/∆ bytes, since IPv4 addresses are 4 bytes and balls have size n/∆.This amount to approximately 661 bytes of data; on today's Internet, packets are generally allowed to be at least 1500 bytes long, so this would take just one packet.
The destination can then perform a ball-vicinity intersection, which requires O( n/∆) time asymptotically but using the above numbers requires less than 165 hash table lookups which is fast in practice. 3Upon executing the ball-vicinity intersection, the destination informs the source whether the ball-vicinity intersection is an empty set or not.If they do intersect, it can inform the source of the node (or nodes) at which ball-vicinity intersection occurs.This requires at most one packet which can be routed from the destination through the source via a stretch-3 path.The source-destination pair, after the above handshaking scheme (that requires at most two packets), now have a route with stretch 2.
In practice, this is likely to be efficient even for relatively short-lived connections.For much larger networks, of course, the exchange of ball information would require more bandwidth and computation; but since a stretch-3 path is available immediately, the reduction to stretch 2 can be treated as an optimization for longer flows in order to amortize the overhead.
Probing and Shortcutting.The protocol for implementing ball-vicinity intersection discussed above does not exploit the optimization discussed in §6.5 and §7.4 for heuristically improving the stretch for the retrieved paths.We discuss the implementation aspects related to the optimization.Implementing the optimization in practice leads to a process, which we call probing and shortcutting (P&S).P&S requires the source node to probe the nodes in its vicinity for improving stretch.We argue that this can be achieved with an extremely low overhead probing scheme.Once the source node finds a node in its vicinity that provides a better stretch, the source can conveniently switch the traffic through the shortcut path.We only discuss the probing mechanism, since shortcutting can be implemented easily in practice (note that the destination is oblivious to the shortcutting mechanism and hence, P&S does not require any handshaking mechanism).
For the probing mechanism, assume that the source opens an initial connection to a destination.The source, every 10 th packet, can probe a node in its vicinity (the question on deciding an appropriate order of probing the nodes in vicinity is discussed below) requesting the length of the path available from this node to the destination.These packets can be extremely small compared to the other data packets, leading to an extremely small overhead in terms of bandwidth consumed (just a fraction 0.1 more packets that are of negligible size compared to the data packets).Since the source-destination connections that account for most of the bandwidth sent on the networks are very long [16], we believe it is reasonable to amortize the cost of the probing over the lifetime of the connection.
In terms of the order of probing, we consider two heuristics.Farthest-first, in which the source probes the nodes that are the boundary nodes of its vicinity; and, closest-first, in which the source performs probing starting with the closest nodes (its neighbors).We show, through evaluations, that the former performs better than the latter.

Evaluation Results
In this section, we evaluate the performance of our stretch 2 and stretch 3 schemes on large-scale synthetic and realistic topologies.We first present our methodology, followed by a summary of the evaluation results and conclude with a detailed discussion on the results.

Methodology
Schemes.We evaluate three schemes: the stretch-3 scheme of Thorup and Zwick (TZ) [14,15]; Reduced approximation ratio (REAR): the stretch-2 scheme from §6 with α = n; and Reduced space (RES): the stretch-3 scheme (for k = 1) from §7 with α = n.Furthermore, we evaluate REAR and RES schemes with and without the P&S optimization discussed in earlier sections.For the TZ scheme, we sampled each node (for set L) with probability log n/α.For REAR and RES, each node was sampled with probability log n/α × deg(v)/ log 2 n.All the constants in the big-O notation were set to be 1.For our REAR and RES schemes, we implemented a modified scheme where we perform vicinity-vicinity intersection rather than ball-vicinity intersection; this requires slightly higher query time (an extra log n factor for the networks that we evaluated our schemes for) but may lead to improved stretch values.
Simulator.We wrote a static simulator to simulate the above schemes.Hence, from the perspective of application to distributed compact routing protocols, the results presented in this section assume a static network topology and give post-convergence results only.As outlined in §9, a distributed implementation of our stretch-2 scheme is a straightforward extension of past work, but we leave a full dynamic evaluation to future work.Our static simulator allows us to evaluate the schemes at much larger scale.
Topologies.We present evaluation results for three topologies.( 1) G(n, m) random graphs, i.e., n = 16384 nodes with m uniform-random edges, with m set so that the average degree is 6, (2) geometric random graphs with n = 16384 nodes with average degree 6, and (3) a 33, 014 node AS-level map of the Internet (referred to as the Internet graph in this section) [13].
For G(n, m) graphs and the Internet graph, link weights are 1; for geometric random graphs, a link's weight is the Euclidean distance between the position of its two nodes.For G(n, m) graphs and for geometric random graphs, we generated 10 different topologies with the same parameters and our results are the average of evaluations of these topologies.For geometric random graphs, we sampled a set of "source" nodes and evaluated the performance of the schemes from these sources to all the destinations.We found that sampling 1/4 of the nodes as sources provided accurate results.

Results and Discussions
Stretch comparison with the TZ-scheme.Fig. 1 shows the performance of the three schemes for various graph topologies (TZ is the original TZ scheme, TZ-d scheme is discussed below).The most notable result of this evaluation is that REAR allows retrieval of exact shortest paths for nearly all source-destination pairs: 98.94% in the G(n, m) graph, and 99.98% in the Internet graph.Though G(n, m) graphs and the Internet graph have highly different structures, these graphs have a common feature: for nearly all sourcedestination pairs, the two vicinities intersect, thus providing a shortest path.In the G(n, m) graph (in which 96.2% source-destination pairs have intersecting vicinities), this occurs since, with high probability, the diameter of the graph is roughly at most twice the vicinity radius.In the Internet graph (in which 96.8% source-destination pairs have intersecting vicinities), vicinity intersection likely occurs at the "core" networks of the Internet.Since TZ scheme does not exploit the vicinity intersection, its performance is significantly worse than our schemes (only 34.4% of the source-destination pairs retrieved shortest paths).
The surprising difference between the performance of the two schemes may be due to the difference in which these schemes construct the landmark set L. We evaluated a modified version of the TZ scheme that uses the same set L as used by our schemes (see TZ-d in Fig. 1).Although this improves the performance of the TZ scheme (74.2% of the source-destination pairs now retrieve shortest paths), it is still much worse than the REAR and RES schemes.We, hence, believe that the high performance of our schemes is indeed due to the vicinity intersection idea.
For geometric random graphs, REAR allows retrieval of shortest paths only for 70.7% of the sourcedestination pairs in comparison to 42.9% for the TZ scheme; indeed, only 4.8% of the source-destination pairs have intersecting vicinities.However, REAR consistently performs better than the TZ-scheme, which in turn performs better than RES.Finally, while the TZ-scheme performs better than RES on an average, the worst-case stretch for the TZ-scheme is consistently worse than RES.We believe that this is due to the P&S optimization, that allows many source-destination pairs to retrieve shorter paths due to short-cutting.
Stretch comparison of REAR and RES.The performance of REAR and RES for various graph topologies is compared in Fig. 2. We note that, as expected, REAR consistently performs better than RES, even without the P&S optimization.However, the more interesting observation is that the P&S optimization is much more effective in RES.In particular, we note that the tail of RES without the P&S optimization is significantly reduced when the optimization is used.
For G(n, m) graphs, the stretch for 99% of the source-destination pairs is less than 1.15 using REAR.For RES, this is almost 1.3 (optimized version) and 1.5 (unoptimized version).The case of geometric random graphs is rather interesting: first, we observe that not many source-destination pairs have intersecting vicinities, otherwise RES without the P&S optimization would not have achieved such a low fraction of source-destination pairs retrieving shortest paths (only around 11%).Despite this, REAR performs surpris-ingly well: almost 48% of the source-destination pairs retrieve shortest paths without the P&S optimization and almost 71% retrieve shortest paths with the P&S optimization.Stretch versus Query Time.For G(n, m) graphs, Fig. 3 shows the variation of mean stretch -averaged over all source-destination pairs -with the number of queries for REAR and RES schemes, for the farthest-first and closest-first heuristics discussed in §9.We see a clear trend of "diminishing returns" where a few initial queries significantly reduce the stretch compared to no queries, after which the improvement is minimal.
Based on the results, we conclude that in general, the farthest-first heuristic performs better in terms of the stretch with smaller query time.For the same two heuristics for stretch versus query time, Fig. 4 shows the results for REAR and RES schemes for the geometric random graph; we note that it is significantly better to start querying with the farthest nodes in the vicinity.Since the vicinities of most source-destination pairs intersect (and if they intersect, they do at least at one of the farthest nodes), queries starting from the farthest nodes achieved an improved stretch (quickly!).In terms of stretch versus query time, the results for the Internet graph were very similar to that of G(n, m) graphs.

Conclusions
This paper presented data structures and query algorithms which significantly improve the space/stretch trade-off for distance oracles and compact routing schemes for the realistic case of sparse graphs.We also argued that the increased query time in our distance oracles is reasonable in practice.Allowing increased query time to improve the space/stretch trade-off brings up several interesting open problems: • Can the query time of our schemes be reduced?In other words, can one design a distance oracle of size O(m + n 2 /α) that returns stretch-2 paths in o(α∆) time?A more challenging problem is to design distance oracles that have size O(mα + n 2 /α) and return stretch-2 paths in o(α) time.We believe that the latter may require significantly new techniques.
• We presented a distributed implementation of our stretch 2 compact routing scheme only for distance oracles that have an aggregate memory requirement of O(mα + n 2 /α), but not for our linear space distance oracles (both for stretch 2 and stretch 3 schemes).While it seems significantly more challenging, a distributed version of our linear space distance oracles could have significant implications in practice: one could achieve stretch 3 with constant amount of storage at nodes in the network.
• The most intriguing problem is to compute lower bounds for distance oracles that take Ω(log n) query time and return constant stretch paths.The holy grail of the distance oracle problem for sparse graphs is whether one can design a data structure of size O(m polylog(n)) that yields constant stretch paths in O(polylog(n)) time.This would be a very significant result.

Claim 2 .
The size of the distance oracle is O(n∆ + n 2 /α), in expectation.Proof: Storing the list of neighbors for each node requires space O(n∆).Note that E[|L|] = n/α, and hence, storing shortest distances from nodes in L to all nodes in the graph requires O(n 2 /α) space in expectation.Storing ℓ(v) and r v requires O(1) space for each node in V \L.Hence, the total expected size is Let the distance returned by the query algorithm be δ(u, v) = d(u, w) + d(w, v) where either w = ℓ(u) or w = arg min x∈B(u)∩Γ(v) {d(u, w) + d(v, w)}.Note that d(u, w) ≤ d(u, ℓ(u)) = r u .

Figure 1 :
Figure 1: Complementary CDF of Stretch in G(n, m) random graph (left), geometric random graph (middle) and Internet graph (right).

Figure 2 :
Figure 2: Complementary CDF of Stretch (REAR and RES) in G(n, m) random graph (left), geometric random graph (middle) and Internet graph (right).

Figure 4 :
Figure 4: Mean stretch versus query time for REAR (left) and RES (right) for 16, 384 node geometric random graph with average degree 6.

Table 1 :
Upper bounds for distance oracles for general undirected graphs.The (α, β ) in column 3 denotes multiplicative α and additive β approximation ratio.∆ denotes the average degree of the graph.Distance oracles with additive stretch are for unweighted graphs.