On the independent set problem in random graphs

In this paper, we develop efficient exact and approximate algorithms for computing a maximum independent set in random graphs. In a random graph G, each pair of vertices are joined by an edge with a probability p, where p is a constant between 0 and 1. We show that a maximum independent set in a random graph that contains n vertices can be computed in expected computation time . In addition, we show that, with high probability, the parameterized independent set problem is fixed parameter tractable in random graphs and the maximum independent set in a random graph in n vertices can be approximated within a ratio of in expected polynomial time.


Introduction
In computer science, many optimization problems can be reduced to the optimization of objectives that are formulated and described in a graph. The development of efficient exact or approximate algorithms for graph optimization problems thus constitutes an important part of the research in combinatorial optimization. However, a large number of graph optimization problems have been shown to be NP-hard [14], which suggests that it is unlikely to develop algorithms that can solve these problems in polynomial time. A well-known example is the Maximum Independent Set problem. Given a graph G = (V , E), a vertex set I ⊆ V is an independent set if there is no edge between any pair of two vertices in I. The goal of the Maximum Independent Set problem is to find an independent set of the largest size in a given graph G. If G contains n vertices in total, the problem can be trivially solved in time 2 O(n) by enumerating and checking all possible vertex subsets in the graph. Although intensive research has been performed to improve the computation time needed to find an optimal solution [13,19,23,25], an algorithm that needs subexponential time is not yet available for this problem. Recently, it is proposed that this problem is unlikely to be solved in subexponential time [4,5].
Due to the difficulty in developing efficient algorithms that can find optimal solutions for these problems, a large number of algorithms have been developed to obtain approximate solutions for these problems in a significantly reduced amount of computation time [20]. These algorithms *Email: yingleisong@gmail.edu can often achieve a trade-off between the optimality of the solutions and the computation time needed to obtain them. For example, a simple algorithm that computes a maximal matching in a graph can approximate its minimum vertex cover within a ratio of 2.0. For some NP-hard problems, approximate solutions that are within a certain approximate ratio cannot be obtained in polynomial time unless NP = P. As an example, it has been shown that it is NP-hard to approximate the minimum vertex cover in a graph within a ratio of 1.362 [6]. Another well-known inapproximability result regarding the Maximum Independent Set problem is that it is NPhard to approximate the maximum independent set in a graph within a ratio of n 1− , where 0 < < 1 is a constant and n is the number of vertices in the graph [17]. This result suggests that an approximate solution with a guaranteed constant approximate ratio cannot be obtained in polynomial time for the Maximum Independent Set problem unless NP = P. So far, the best known approximation ratio that has been achieved for this problem in general graphs is O(n log 2 2 log 2 n/log 3 2 n) [11]. For those problems that cannot be even approximated within a good approximation ratio in polynomial time, such as the Maximum Independent Set problem, heuristics that can efficiently generate approximate solutions are often employed in practice to solve them [2,16,22]. However, solutions generated by heuristics are not guaranteed to be close to the optimal ones and their applications are thus restricted to scenarios where the accuracy of solutions is not a crucial issue.
Parameterized computation is an alternative approach that may lead to practically efficient algorithms for some NP-hard problems. In practice, an instance of an NP-hard problem may contain one or a few parameters, parameterized computation focuses on the development of exact algorithms that can efficiently solve the problem while these parameters are small positive integers. Specifically, if we denote these parameters by p 1 , p 2 , . . . , p t , the problem is fixed parameter tractable if there exists an algorithm that can solve the problem in time O(f (p 1 , p 2 , . . . , p t )n d ), where n is the size of the problem, f is a function of the parameters and d is a constant that does not depend on n or any of the parameters p 1 , p 2 , . . . , p t . A well-known example of fixed parameter tractable problems is Vertex Cover. In [8], it is shown that there exists an algorithm that can determine whether a graph contains a vertex cover of size k or not in time O(2 k n), where n is the number of vertices in the graph. However, not all NP-hard problems are known to be fixed parameter tractable and it has been shown that some of them are unlikely to be solved by efficient parameterized algorithms. Different parameterized complexity classes have been developed in parameterized complexity theory to reflect the parameterized intractability of these problems [8]. For example, the Independent Set problem is known to be complete for complexity class W[1] [9], it thus cannot be solved by an efficient parameterized algorithm unless W [1] collapses into the class of fixed parameter tractable problems. A comprehensive survey on parameterized algorithms and parameterized complexity theory can be found in [7].
In this paper, we develop exact and approximate algorithms for the Maximum Independent Set problem where the underlying graph is a random graph generated based on the Erdős-Rényi model [10]. Such a random graph is generated by treating each pair of vertices independently and adding an edge to join them with a probability of p (0 < p < 1), where p is a constant. Recent research in molecular biology has shown that the protein side chain interaction network conforms remarkably well to random graphs generated by the Erdős-Rényi model [24]. Therefore, efficient algorithms for some NP-hard problems in random graphs, if exist, may significantly improve the computational efficiency for some important optimization problems related to protein structure prediction.
In [15,21], it has been shown that with high probability, the maximum independent set in a random graph is of size O(log 2 n). However, this result does not directly lead to an algorithm that can compute the maximum independent set in a random graph in expected subexponential time. In [12], a polynomial time algorithm that can compute a maximum independent set in a sparse random graph with high probability is developed. However, the algorithm is based on a large independent set that is embedded in the graph and thus cannot be used for all graphs. We show that the maximum independent set in a random graph can be computed in expected computation time 2 O(log 2 2 n) , where n is the number of vertices in the graph. This result significantly improves the best known time complexity O(2 n/4 ) for finding a maximum independent set in general graphs [25].
In addition, we show that, with high probability, the parameterized independent set problem is fixed parameter tractable in random graphs. For approximate algorithms, we develop an algorithm that can achieve an approximation ratio of 2n/2 √ log 2 n in expected polynomial time, which is a significant improvement compared with the best known approximate ratio that can be achieved in general graphs [11].

Maximum independent set in random graphs
A random graph G(V , p), where 0 < p < 1, is a graph obtained by independently adding edges between each pair of vertices in V with a probability p. Given a vertex v ∈ V , the degree of v in G is the number of vertices that are connected to v by an edge in G. We use deg G (v) to denote the degree of vertex v in graph G and N G (v) to denote the set of vertices that are connected to v by an edge in G. A vertex subset I ⊆ V is an independent set in G if there is no edge between any pair of vertices in I. The goal of the Maximum Independent Set problem is to find an independent set of the largest size in a given graph.
In [15,21], it is shown that, with high probability, the size of a maximum independent set in a random graph G(V , p) is 2 log 2 n/log 2 where n is the number of vertices in G. A straightforward algorithm by exhaustively enumerating all vertex subsets of size 2 log 2 n/log 2 1/(1 − p) can thus compute a maximum independent set in most random graphs in time n O(log 2 n) . However, to compute a maximum independent set in all random graphs, the algorithm must be able to cope with the cases where the graph contains an independent set of size larger than O(log 2 n). The algorithm needs time 2 O(n) to compute a maximum independent set in these cases. The best known upper bound of the probability for a random graph to has a maximum independent set larger than O(log 2 n) is 1/n O(1) [15,21], the expected time complexity of this enumeration based algorithm is thus 2 O(n) .
We show that the maximum independent set in a random graph G = (V , p) can be computed in expected subexponential time.
where μ is a positive constant that only depends on and p.
Proof If such a vertex does not exist, the number of edges n(E) in G is at most (p − )n 2 /2 since the degree of each vertex is at most (p − )n. However, from the construction of graph G, the expected number of edges in G can be obtained as follows: From Chernoff bound, we can bound the probability for n(E) < (p − )n 2 /2 by where δ = (n − p)/p(n − 1). For sufficiently large n, we have We can thus immediately obtain We then let μ = 2 /32p ln 2 and we conclude that with probability at least 1 The proof of Lemma 2.1 relies on the fact that p is a constant independent of n, the lemma does not hold if the value of p depends on n. A random graph G = (V , p) in n vertices is good if it contains at least one vertex whose degree is at least (p − )n. Given a random graph, the algorithm starts by finding a vertex v such that deg G (v) is at least (p − )n. If such a vertex does not exist, the algorithm enumerates all subsets of V and returns an independent set of the largest size. If v exists, the algorithm branches on two possible cases on whether v is contained in I or not. In particular, if v ∈ I, v and vertices in N(v) are deleted from G and the resulting graph is G 1 ; if v / ∈ I, v is deleted from G and the resulting graph is G 2 . The algorithm is then recursively applied on both G 1 and G 2 to compute a maximum independent set in each of them. We use I 1 and I 2 to denote the maximum independent sets in G 1 and G 2 found by the algorithm, respectively. I 2 is returned as a maximum independent set in G if |I 2 | ≥ |I 1 | + 1 and I 1 ∪ {v} is returned otherwise. We show that this algorithm terminates in expected time 2 O(log 2 2 n) . Proof We show that the algorithm described above terminates in expected time 2 O(log 2 2 n) . In particular, the algorithm is recursive and for each step of recursion, we have the following recursion relation for the computation time if the underlying graph is good and contains m vertices where T(m) is the computation time needed by the algorithm in a graph on m vertices. The term O(m 2 ) is the computation time needed to find a vertex whose degree is at least (p − )m, since the time needed to compute the degree of a vertex is O(m) and the algorithm may need to check m vertices to find such a vertex. If the underlying graph is not good, the algorithm exhaustively enumerates all subsets in the graph and finds an independent set of the largest size. The computation time is 2 O(m) .
We are now ready to establish the expected computation time for the algorithm. In particular, we use ET(m) to denote the expected computation time of the algorithm on a graph that contains m vertices. From Lemma 2.1, an underlying graph G in m vertices is good with a probability of at least 1 − 2 −μm 2 . We thus can immediately obtain the following recursion for ET(m): where the second inequality is due to the fact that 2 O(m)−μm 2 is bounded by a constant for all positive integers m. We then show that ET(m) ≤ 2 c log 2 2 m , where c is a positive constant. We show this by induction. First, for a sufficiently large positive integer m 0 whose value will be specified later, we let c 0 = max 1≤t≤m 0 {log 2 ET(t)/log 2 2 t} and choose c = max {c 0 , 2/log 2 (1/(1 − p + )), 1}. It is not difficult to see that ET(l) ≤ 2 c log 2 2 l if 1 ≤ l ≤ m 0 . We then assume that this holds for all positive integers less than m. From the above recursion relation on ET(m), we can obtain where B is a positive constant independent of c, p, and s, q, l are some positive constants that depend on c, p, only. The first inequality is obtained from the assumption for induction. The second one is due to the fact that log 2 2 ((1 − p + )m) = log 2 2 (1 − p + ) + 2 log 2 (1 − p + ) log 2 m + log 2 2 m and we can let l = 2c log 2 1/(1 − p + ) , s = 2 c log 2 2 (1−p+ ) . To establish the third inequality, we have the third inequality thus follows.
From the fact that c ≥ 2/log 2 1/(1 − p + ), we have l ≥ 4. We let  1 (p, ). m 0 can be determined as follows: It is not difficult to see that when c ≥ c and m ≥ m 0 , we have s m −l − log 2 m/24m ≤ 0. In addition, we can further verify that since c ≥ c , m ≥ 1/ √ 1 − p + , and log 2 (1 − p + ) ≤ 0, we can immediately obtain the following thus holds the fourth inequality thus follows. From the principle of induction, the theorem has been proved.

Parameterized algorithm for independent set problem
The parameterized independent set problem is to decide whether a given graph G = (V , E) contains an independent set of size k or not. The problem is known to be W[1]-hard [7][8][9] and cannot be solved in time n o(k) in general graphs unless W [2] = FPT [4,5]. We show that if the underlying graph G is a random graph, the problem can be solved in expected time 2 O(k 2 ) + O(n 3 ), where n is the number of vertices in the graph. We need the following lemma to analyse the time complexity of the algorithm. Proof The proof is similar to the proof of Lemma 2.1. If such a vertex does not exist, the degree of every vertex in G is at least (p + )n. The graph thus contains at least (p + )n 2 /2 edges. The expected number of edges in G is pn(n − 1)/2. We use n(E) to denote the number of the edges in G. From Chernoff bound, we can bound the probability for G to contain at least (p + )n 2 /2 edges the lemma immediately follows by letting μ = 2 /64p ln 2.
The proof of Lemma 3.1 relies on the fact that p is a constant independent of n, the lemma does not hold if the value of p depends on n.
We then consider the case where k ≤ L(n). We use the following procedure to generate an independent set I. We start with the vertex u with the minimum degree in G, we include u in I and remove u and all its neighbors in G from G. We denote the resulting graph by G 1 . The procedure can be repeatedly executed until there are at most n 2/3 vertices left in the graph. We use G 0 = G, G 1 , G 2 , G 3 , . . . , G l to denote the intermediate graphs generated during this iterative procedure. It is not difficult to see that vertices in I form an independent set in G.
We show that the above procedure can generate an independent set I of size at least L(n) with high probability. We use G 1 , G 2 , G 3 , . . . , G l to denote the resulting graph in each iterative step and n(G i ) to denote the number of vertices in graph G i . From Lemma 3.1, the following holds with a probability of at least 1 − 2 −μn 2 (G i ) for each i between 0 and l.
Since n(G i ) > n 2/3 , the probability for this inequality to hold for all i's between 0 and l is at least 1 − n2 −μn 4/3 . If this inequality holds for all i's between 0 and l, we can immediately obtain l ≥ log 1/(1−p− ) n n 2/3 (38) I thus contains at least L(n) vertices. With a probability of at least 1 − n2 −μn 4/3 , the above iterative procedure generates an independent set of size L(n). Since k < L(n), the algorithm returns 'yes' if I indeed contains L(n) independent vertices, otherwise, the algorithm simply enumerates all vertex subsets in G and checks whether one of them is an independent set of size at least k. Since the procedure for generating I needs O(n 3 ) time, the expected computation time needed for this is at most where the equality is due to the fact that the second term is bounded by a constant when n is sufficiently large. The algorithm thus needs an expected time 2 O(k 2 ) + O(n 3 ), the theorem has been proved.

Approximate algorithm
As discussed in the introduction, the maximum independent set problem cannot be approximated within a ratio of n 1− in polynomial time unless P = NP, where is any positive constant. In [3], it is shown that the maximum independent set in a graph can be approximated within a ratio of O(n/log 2 2 n). In [11], the approximation ratio is improved to O(n log 2 2 log 2 n/log 3 2 n). The result so far remains the best known approximation ratio achieved for this problem in general graphs. In [15,18,24], a polynomial time algorithm that can approximate the maximum independent set in a random graph within a constant ratio with high probability is developed and analysed. However, the approximation ratio of the algorithm is not guaranteed to be constant for all graphs. We show that the maximum independent set in a random graph can be approximated within a ratio of 2n/2 √ log 2 n in expected polynomial time, which is a significant improvement compared with the best known approximate ratio for this problem in general graphs. Proof We use the following simple algorithm to compute an independent set in G. We let k = 2 √ log 2 n and partition the vertices in G into l disjoint vertex subsets such that l − 1 of them contains k vertices and the remaining one contains at most k vertices. We use G 1 , G 2 , . . . , G l to denote the subgraph induced by vertices in these vertex subsets. It is not difficult to see that l ≤ n/k + 1.
We then use the algorithm we have developed in Theorem 2.1 to compute a maximum independent set in each of G 1 , G 2 , . . . , G l and return the one that contains the largest number of vertices.
We first show that the algorithm returns an independent set in expected polynomial time. G 1 , G 2 , . . . , G l are disjoint and the expected time needed to compute a maximum independent set in each of them is at most 2 c log 2 2 k , where c is some positive constant that only depends on p. Since k ≤ 2 √ log 2 n , the expected computation time needed to compute the maximum independent set in one subgraph is at most 2 c log 2 n = n c . The algorithm thus returns an independent set in expected time n c+1 .
We then show that the algorithm can achieve an approximate ratio of 2n/2 √ log 2 n . We use APX(G) to denote the size of the independent set returned by the algorithm and OPT(G) to denote the size of a maximum independent set in G. we assume that I is a maximum independent set in G. Since we have partitioned the graph G into l disjoint subgraphs G 0 , G 1 , . . . , G l , at least one of the l subgraphs contains at least OPT(G)/l vertices from I. These vertices form an independent set in the subgraph. Since the algorithm computes a maximum independent set in each subgraph and returns the one with the largest size, we immediately obtain The second inequality is due to the fact that l ≤ n/k + 1. The fourth inequality is due to the fact that k ≥ 2 √ log 2 n − 1. The last inequality holds for sufficiently large n. The theorem thus has been proved.

Conclusions
In this paper, we study the independent set problem in random graphs. We show that a maximum independent set in a random graph can be computed in expected subexponential time. We also show that the parameterized independent set problem is fixed parameter tractable with high probability for random graphs. Using techniques based on enumeration, we show that the largest common subgraph in two random graphs can be computed in expected subexponential time. Our work also suggests that the maximum independent set in a random graph can be approximated within a ratio of 2n/2 √ log 2 n in expected polynomial time, which significantly improves on the best known approximate ratio for this problem in general graphs.
It remains unknown whether the maximum independent set in a random graph can be computed in expected polynomial time or not. One possible direction of future work is to study whether there exists such an algorithm. Another related open question is that if such an algorithm does not exist, whether it can be approximated within an improved ratio in expected polynomial time. Further investigations are needed to solve these problems.