On the Complexity of Conditional DAG Scheduling in Multiprocessor Systems

As parallel processing became ubiquitous in modern computing systems, parallel task models have been proposed to describe the structure of parallel applications. The workflow scheduling problem has been studied extensively over past years, focusing on multiprocessor systems and distributed environments (e.g. grids, clusters). In workflow scheduling, applications are modeled as directed acyclic graphs (DAGs). DAGs have also been introduced in the real-time scheduling community to model the execution of multi-threaded programs on a multi-core architecture. The DAG model assumes, in most cases, a fixed DAG structure capturing only straight-line code. Only recently, more general models have been proposed. In particular, the conditional DAG model allows the presence of control structures such as conditional (if-then-else) constructs. While first algorithmic results have been presented for the conditional DAG model, the complexity of schedulability analysis remains wide open. We perform a thorough analysis on the worst-case makespan (latest completion time) of a conditional DAG task under list scheduling (a.k.a. fixed-priority scheduling). We show several hardness results concerning the complexity of the optimization problem on multiple processors, even if the conditional DAG has a well-nested structure. For general conditional DAG tasks, the problem is intractable even on a single processor. Complementing these negative results, we show that certain practice-relevant DAG structures are very well tractable.


I. INTRODUCTION
As parallel processing became ubiquitous in modern computing systems, parallel task models have been proposed to describe the structure of parallel applications.
A popular representation is the DAG (directed acyclic graph) model; in this model a task is represented by a DAG G = (V, E) where V is a set of vertices and E a set of directed edges between these vertices. Each v ∈ V represents the execution of a sub-task (or job), and it is characterized by an execution time. The edges represent dependencies between the jobs: if (v 1 , v 2 ) ∈ E then job v 1 must complete execution before job v 2 can begin execution.
The DAG model has been extensively used to represent cooperative tasks (workflows) which typically require more computing power beyond single machine capability: scientific workflows, multi-tier web service workflows, and big data processing workflows such as Map Reduce from Google and Dryad from Microsoft. A lot of research effort has been done to include in the model specific aspects of the computer platform (multiprocessor systems or distributed environments like clusters and grids), and on other specific aspects of the considered workflow (e.g. resource provisioning and mapping, communications costs etc.). We refer to the survey papers [1], [2] and references therein for a thorough presentation.
The DAG model has also been used in the real-time systems community to model the execution of multi-threaded recurrent tasks to be executed on a multi-core architecture [3], [4].
It is well known that many variants of DAG scheduling are NP-complete even in simple cases. In fact, Ullman [5] showed that it is NP-complete to decide whether the makespan (latest completion time among all jobs) for scheduling a DAG is within a certain deadline, even if i) all vertices have unitary execution times using an arbitrary number of processors, and ii) all vertices have execution time equal to one or two using only two processors.
A scheduling paradigm which is widely used in practice is called list scheduling. The basic idea is to assign priorities to jobs and obtain a list of jobs by sorting them according to their priorities; during execution whenever a processor is idle the available job with highest priority is selected for processing. Graham [6] proved the following performance guarantee: for any priority order of jobs, list scheduling produces a schedule on m processors that has a makespan no greater than (2 − 1/m) times the minimum possible makespan. Most run-time scheduling algorithms that are used for scheduling DAGs use some variant of list scheduling. While the DAG model captures the intra-parallelism of tasks, it does not capture the typical conditional nature of control flow instructions, such as if-then-else statements. The presence of such conditional constructs within the code modeled by the task may mean that different activations of the task cause different parts of the code to be executed. The conditional DAG model generalizes the DAG model allowing additional conditional nodes [7], [8]. We note that a workflow can also have conditional branches as in BPEL [9]. A formal definition follows in Section II.
In this paper we study the problem of computing the worstcase makespan for a conditional DAG under list scheduling with an arbitrary but fixed priority order (a.k.a. FP-scheduling) from a complexity theoretic view point. This is a fundamental question, e.g., when performing schedulability analysis.

Our results
We show that it is coNP-complete to decide whether FPscheduling a given conditional DAG task on multiple processors can be done within a given deadline in each of the (possibly exponentially many) realizations of conditional vertices. This is true even if conditions are independent (see Section 2 for a formal definition). The coNP-completeness holds in both, a non-preemptive (run-to-completion) and preemptive migratory setting. For these results we give a general reduction framework based on a non-obvious relation between the problem of computing the maximum makespan for a conditional DAG and minimizing the maximum completion time (makespan) for a DAG. The framework allows us to derive several refined complexity results for special graph classes by using (known) NP-hardness results for classical makespan minimization.
It is known that the worst-case makespan for FP list scheduling a conditional DAG task with independent constructs on a single processor can be computed in polynomial time [8]. We show that it is crucial for this result that different conditional constructs are independent of each other, meaning that they are nested and there is no dependency between them such as a shared node. In particular, we show in Section III-D that computing the worst-case makespan for FP-scheduling a general conditional DAG task with shared nodes is coNPcomplete even on a single processor.
If the conditional DAG (without shared nodes) has in each realization bounded width (precise definitions follow), then we can compute the worst-case makespan for non-preemptive FPscheduling in pseudo-polynomial time via a dynamic program. We show also that this algorithm can be turned into an efficient algorithm by loosing only marginally in the performance if a certain monotonicity property holds for the job completion times under FP-scheduling. We present a fully polynomialtime approximation scheme (FPTAS) in this case. We prove that it does hold, e.g., for a bounded number of chains. This monotonicity result might be of independent interest to the scheduling community as has been motivated e.g. in [10]. For general DAGs the monotonicity property does not hold as a classical example known as Graham anomaly [6] shows.

Other related work.
None of the (earlier proposed) parallel-task models (fork/join, synchronous parallel, DAG) captures control flow information such as conditional executions. Some alternatives to the conditional DAG model have been considered. Fonseca et al. [11] propose the multi-DAG that represents a task as a collection of DAGs, each describing a conditional execution. Whenever a task is executed, exactly the sub-tasks of one of its DAGs need to be processed. The main issue with this model is the possibly exponential number of control flows.
Chakraborty et al. [12], [13] consider a more restricted variant of the conditional DAG model, which models tasks as a two-terminals DAG and for each node exactly one successor needs to be executed. Additionally each edge characterizes a delay for the start time of the successor. The authors provide complexity results and exact and approximative schedulability analysis for preemptive and non-preemptive scheduling.
Erlebach et al. [14] consider makespan minimization for AND/OR-Networks that allow constraints to specify that a job can be executed if at least one predecessor has been completed.
Federated scheduling is a scheduling policy that has been proposed for scheduling a set of recurrent tasks modeled by DAGs in a multiprocessor system; each task has a release time and a deadline. In this model, we assign high-demand conditional DAGs (with high density) to a number of processors that are completely dedicated whereas all remaining low-demand tasks are assigned to a pool of shared processors.
Baruah [15] considered federated scheduling for conditional recurring DAG tasks assuming constrained deadlines. Since each high-demand task is executed independent of the other tasks the main challenge is to assign each such task to a minimum number of dedicated processors, such that it can be completed within its deadline for each possible realization. In the case of constrained deadlines our results directly imply complexity bounds on the problem of minimizing the number of processors necessary to schedule high-demand DAG tasks.
Overview. In Section II we define the task model and notation. In Section III we give our hardness results. For conditional DAG tasks with bounded width, we give in Section IV a pseudo-polynomial algorithm. In Section V we show how to turn this algorithm into a fully polynomial approximation scheme (FPTAS) if a certain monotonicity property holds.

The Conditional DAG Model
Let τ be a conditional parallel task (cp-task) that executes on m identical processors. The cp-task τ is characterized by a conditional directed acyclic graph G = (V, E, C) where V is a set of nodes, E ⊆ V × V is a set of directed edges (arcs) and C ⊆ V × V is a set of distinguished node pairs, the conditional pairs. Each node j ∈ V represents a sequential computation unit (sub-task, job) with an individual execution time p j . Slightly abusing notation we refer to jobs and nodes equivalently. The arcs describe the dependencies between sub-tasks as follows: if (v 1 , v 2 ) ∈ E, then v 2 can only start processing if v 1 has completed. Job v 1 is called a predecessor job of v 2 , and job v 2 is called a successor job of v 1 . A distinguished pair (c 1 , c 2 ) ∈ C of nodes is a conditional pair which denotes the beginning and ending of a conditional construct such as if-then-else statements. In sub-task c 1 , there is a conditional expression being evaluated and, depending on the outcome, exactly one out of many possible subsequent successors must be chosen. In our figures, the conditional nodes are depicted by a square whereas all other nodes are circles; see Figure 1. Following the definition given in [7], [8] we define a conditional DAG formally as follows.
and a set of conditional pairs C ⊆ V × V such that the following holds for each (c 1 , c 2 ) ∈ C: 1) There are multiple outgoing edges from c 1 in E. Suppose that there are exactly k outgoing edges from c 1 to vertices s 1 , s 2 , . . . , s k for some k > 1. We call k the branching factor of (c 1 , c 2 ). Then there are exactly k incoming edges into c 2 in E, from the vertices t 1 , t 2 , . . . , t k . 2) For each l ∈ {1, . . . , k} let P l be the set of all paths from s l to t l in G. We define G l = (V l , E l ) as the union of all paths from s l to t l , i.e., V l = p∈P l V (p) and E l = p∈P l E(p), where V (p) and E(p) denote the sets of vertices and edges on path p. We refer to each G l with l ∈ {1, . . . , k} as a conditional branch of (c 1 , c 2 ). 3) It must hold that V l ∩ V l = ∅ for all l, l with l = l .
Additionally, with the exception of (c 1 , s l ) and (t l , c 2 ) there should be no edges in E into vertices in V l from nodes not in V l or vice versa for each l ∈ {1, 2, . . . , k}.
For each pair (c 1 , c 2 ) ∈ C we call c 1 and c 2 conditional vertices and refer to the subgraph of G beginning at c 1 and ending at c 2 as conditional construct in G. Notice that in the above definition, 3) explicitly rules out any interaction between a node within a conditional branch and any other node outside this particular branch. The restriction to such well-nested structures is very natural when modeling the execution flow of a structured programming language [8]. We refer to a conditional DAG with shared nodes when relaxing restriction 3) and allowing interaction between different conditional branches. We consider such a generalized model only in Subsection III-D for demonstrating a drastic increase in complexity.
When executing a conditional DAG G = (V, E, C) at most one conditional branch per conditional pair is executed. For a c = (c 1 , c 2 ) ∈ C no branch is executed if and only if the conditional construct of c is nested into a branch that is not executed. Thus, a job j is executed if one of the following conditions holds: (i) node j is not part of any conditional branch, i.e, j ∈ V l for each branch G l of any conditional pair (c 1 , c 2 ), or (ii) the innermost branch G l with j ∈ V l is being executed.
Let J ⊆ V be a set of jobs obtained by fully executing the jobs of the conditional DAG G = (V, E, C) taking into account the outcome of the conditional nodes. Let G J = (V J , E J ) with V J = J denote the subgraph of G induced by J. We call G J a realization of G, and say a vertex j ∈ V is active for J, if j ∈ V J holds. Let J denote the collection of all job sets J for which there is a realization with V J = J.

Fixed-Priority List Scheduling
Let τ be a conditional DAG task with G = (V, E, C) and with execution times p j , for each j ∈ V , to be executed on m parallel identical processors. Let ≺ be a given fixed-priority order (FP-order) over V .
A non-preemptive fixed-priority list schedule (FP-schedule) is constructed as follows. At any point in time, when a processor is idle, the job with the highest priority according to ≺ among the available jobs starts execution and runs until completion. A job is available if all its predecessors have been completed. To avoid ambiguities for jobs j with p j = 0, we say the successors of such jobs are available if all predecessors j with p j = 0 have been started and all predecessors j with p j > 0 have been completed.
If we allow preemption (and migration), then at the arrival of a job of a higher priority, any executing lower-priority job is preempted. A preempted job may resume processing at any later point in time and on any processor at no extra cost. We assume that any overhead is covered in p j .
For each J ∈ J let S J denote the FP-schedule induced by ≺ for the realization G J . Let C J denote the latest completion time of any job in G J in S J . This is the makespan for realization G J of the cp-task τ . We may assume that there is just a single cp-task in our non-periodic task setting since several cp-tasks can be merged into one by adding nodes with zero execution times.
Then, M (G, ≺) = max J∈J C J is the worst-case makespan of τ for list scheduling according to the FP-order ≺.
Definition 2 (Problem CDAG-MAX). Given a cp-task with a conditional DAG G, execution times p j , a number m of parallel identical processors and an FP-order ≺, the worst-case makespan problem (CDAG-MAX) is to compute M (G, ≺).
Slightly abusing notation, we use CDAG-MAX also to refer to the following decision variant of this problem in the complexity analysis: for a given CDAG-MAX instance and a parameter D decide whether M (G, ≺) ≤ D.
We observe that M (G, ≺) can be approximated within a factor 2 in polynomial time. To see that, consider the wellknown Graham bounds [6] on the makespan for any FPschedule (mentioned also in [8], [15]): where L max denotes the length of the longest chain in the conditional DAG and V max is the maximum total volume of execution time that has to be executed in a realization of the cp-task. Both lower bounds can be computed in polynomial time; how to compute V max is shown in [8], [15]. Further, it holds that Lemma 1. CDAG-MAX can be approximated within a factor 2 in polynomial time, i.e., we can efficiently compute the value apx = L max + V max /m that satisfies

III. COMPLEXITY
In this section we show several NP-hardness and inapproximability results regarding non-preemptive and preemptive FPscheduling of conditional DAG tasks. Firstly, we establish a reduction framework via a makespan maximization problem and prove that for non-preemptive FP-scheduling CDAG-MAX is strongly coNP-hard and that approximating CDAG-MAX within a factor of 7 5 is NP-hard. The framework can also be used to derive further hardness results for various special graph classes. We furthermore investigate the problem for preemptive FP-scheduling and show that deciding CDAG-MAX remains strongly coNP-hard and approximating CDAG-MAX within a factor of 6 5 is still NP-hard. Finally, we consider the conditional DAG model with shared nodes and show that CDAG-MAX is much harder in this case: it is coNP-hard already on a single processor. This is in contrast to CDAG-MAX for the conditional DAG model as studied in this paper, which is solvable in polynomial time on a single machine [8].
As we can show that the complement of CDAG-MAX is in NP by using realizations G J with C J > D for an input parameter D as certificates, all our coNP-hardness results imply the coNP-completeness of the corresponding problems.

A. A Reduction Framework
We first introduce an approximation preserving polynomial time reduction from an auxiliary problem, the list scheduling makespan maximization problem (LS-MAX). This reduction then gives us a framework to deduce hardness and inapproximability results for CDAG-MAX; by showing the NP-hardness of solving or approximating LS-MAX, we prove corresponding hardness results for CDAG-MAX by exploiting the reduction.
Definition 3 (Problem LS-MAX). We are given a precedence constraint DAG G = (V, E), jobs with execution times p j for each j ∈ V , m identical parallel processors and a deadline D. The task is to decide whether C max > D, where C max is the maximum makespan that can be achieved by any list scheduling order (i.e., any FP-order). Theorem 1. There is an approximation preserving polynomial time reduction from LS-MAX to CDAG-MAX.
Proof. Consider an LS-MAX instance with DAG G = (V, E), jobs V = {1, . . . , n}, execution times p j , for j ∈ V , and m processors. We construct an instance of CDAG-MAX on m = m processors with a conditional DAG G = (V , E , C ), execution times p j and an FP-order ≺ as follows: 1) For each job j ∈ V add a) n job copies v 1 j , . . . , v n j , each with execution time p j , b) a conditional pair (c j 1 , c j 2 ) with execution times zero that uses each v l j as a conditional branch; see Fig. 2a. Figure 2b for an illustration of the construction which can be done in polynomial time. Let C max be the maximum makespan of the given LS-MAX instance. We now show that To do so, consider an arbitrary realization G J . Figure 2c illustrates a realization for the example in Fig. 2b. Observe that in G J exactly one job copy v l j is active for each job j ∈ V . Let v l j and v l j be active job copies in G J , then, by construction, there is a path Note that all vertices on P , apart from the endpoints, have an execution time of zero and that, according to the given order ≺, all vertices with execution time zero are processed before all other vertices. Therefore, the only function of P when scheduling G J is, that it formulates a precedence constraint between v l j and v l j . Thus, all precedence constraints between the original jobs in G are also present between the corresponding active job copies in G J .
In addition to the active job copies and the paths that connect them, G J only contains a unique predecessor respectively successor with an execution time of zero for job copies v l j such that j has no predecessor respectively successor in G. As those jobs have an execution time of zero, they do not affect the schedule of G J given ≺.
Furthermore, each job copy v l j has the same execution time as the original job j, and G as well as G J are scheduled on the same number of processors.
By definition of CDAG-MAX, the active job copies in any realization G J are scheduled using FP-scheduling with order ≺ to achieve a makespan of C J . In LS-MAX, any list scheduling order can be chosen. In specific, for each realization G J there is an order ≺ LS that orders the jobs in G the same way as ≺ schedules the active job copies in G J . I.e., there is an order ≺ LS , such that v l j ≺ v l j holds for the active job copies v l j and v l j of original jobs j, j ∈ V if and only if j ≺ LS j . Thus, ≺ LS achieves a makespan of C J . As this holds for any realization, M (G , ≺) ≤ C max follows.
Consider an arbitrary list scheduling order ≺ LS on the jobs in G that achieves a makespan of C. We show that there is a realization G J such that ≺ orders the active job copies in G J exactly as ≺ LS orders the original jobs in G. For each job j ∈ V let q j denote the position of j in ≺ LS , i.e., j has the q j highest priority in V . Then, there is a realization G J such that v qj j is the sole active job copy of j in G J for each j ∈ V . Let j and j with j = j be two arbitrary jobs in V with j ≺ LS j , then q j < q j holds by definition and v qj j and v q j j are the active job copies in G J . By construction of job copies in G J exactly as ≺ LS orders the original jobs in G and achieves a makespan of C. As this holds for each list scheduling order ≺ LS , C max ≤ M (G , ≺) and thus C max = M (G , ≺) follows.
Observe that the reduction from LS-MAX to CDAG-MAX in some way preserves the structure of the input precedence constraint graph G. Let G = (V , E , C ) be the constructed conditional DAG. Consider the graph G C that uses each conditional pair c i = (c i 1 , c i 2 ) ∈ C as a vertex and has edges between c i and c j if and only if (c i 2 , c j 1 ) ∈ E . We can observe that G and G C are isomorphic by definition of the reduction. As each realization G J of G contains a simple path instead of each conditional construct, we can observe the following. This lemma is useful to show the coNP-hardness of CDAG-MAX for special graph classes. If we show that LS-MAX is NP-hard for precedence constraints graphs that form a tree or are constant number of chains, Lemma 2 and the reduction of Theorem 1 imply the coNP-hardness of the corresponding CDAG-MAX variant.

B. Hardness and Inapproximability Results
We now exploit the previously introduced reduction framework to show hardness and inapproximability results for CDAG-MAX. First, we show the strong NP-hardness of LS-MAX that, in combination with the reduction of Theorem 1, implies the strong coNP-hardness of CDAG-MAX. Then, we consider special cases of LS-MAX and CDAG-MAX and use the reduction framework to derive further hardness results. Finally, we show that it is NP-hard to approximate LS-MAX within a factor of 7 5 , which implies the same result for CDAG-MAX. in the moment when l is assigned. The term j =l pj m + p l is maximal for the constructed instance if n + 1 is the job that determines the makespan since it is the largest job. Thus, the following term is an upper bound on the maximum possible makespan for the constructed instance I : As argued above, this upper bound can be reached iff jobs 1, . . . , n can be scheduled such that each processor has load  By similar arguments as in the proof of Theorem 2 and a particular hardness result for makespan minimization in [17], we can give further refined complexity results for particular graphs classes. The full proof is omitted.

Theorem 4. LS-MAX is (a) strongly NP-hard even if the precedence constraint graph is a tree and (b) weakly NP-hard even if the precedence constraint graph consists of four chains to be processed on two processors.
The theorem, the reduction of Theorem 1 and Lemma 2 then directly imply the following hardness results for CDAG-MAX. Part (c) of the theorem can be shown by using the existing reduction, exploiting the hardness of LS-MAX without precedence constraints and adding dummy terminals.
Theorem 5. CDAG-MAX is (a) strongly coNP-complete even if each realization G J of the conditional DAG G is a tree, (b) weakly coNP-complete even if each realization G J of the conditional DAG G consists of four chains to be processed on two processors and (c) strongly coNP-complete even if the conditional DAG is a two-terminals series-parallel graph.
Finally we give an inapproximability result for LS-MAX and consequently CDAG-MAX. Our reduction from CLIQUE is inspired by [18]; their reduction for makespan minimization with precedence constraints gives a hardness of approximation bound of 4/3 assuming unit execution times. We obtain a slightly better bound using non-unit execution times. • t = 2: we execute the remaining nodes in E and all nodes in C using (|E| − h) + (|V | + h) = m processors. • t = 3: we start execution of job x that finishes at time 7.

Proof of Claim 2):
We first show that if there is no k-clique in G, node x starts execution at time t ≤ 1 for any FP-order. In fact, if x does not start execution at t = 0, then at time 0 we execute jobs of V and A. If all jobs in A are executed at t = 0, then we can process only k jobs in V . Since G has no k-clique, it follows that at t = 1 we can execute jobs in B and at most h − 1 jobs in E whose predecessors have been processed in the previous step. Therefore job x is processed at t = 1 independently of its position in the ordering.
If x does not start execution at t = 0 and not all jobs in A are executed at time 0, then we observe that at least m − |V | jobs of A have been processed. Therefore at time t = 1 there are at most |V | − k unprocessed jobs in A and possibly all jobs in E can be processed; therefore there are at least m − (|V | − k) − |E| = k free processors and job x is processed independently of its position in the ordering at time t = 1. We conclude that job x is completed by time t = 5.
We now show that all other jobs are completed by time t = 5 for all FP-orders. To prove this, we first show that at t = 2 all jobs in A and V are completed for all FP-orders.
If x and all jobs in A are started at t = 0, then at time 1 there are at most |V | − k + 1 unscheduled jobs in V and at most (k −1)(k −2)/2 = h−k +1 jobs of E can be processed. This leaves enough free processors to schedule all remaining jobs in V and all jobs in B, as and only one processor is busy processing x. It follows that all jobs in V , A and B are finished at t = 2.
If at time t = 0 job x is not started while all jobs in A are executed, then there are at most |V | − k unscheduled jobs in V at time 1. Since G has no k-clique, there are at most h − 1 jobs in E that can be processed. Even if all these jobs are processed alongside job x, we are left with (m − 1) − (h − 1) = m − h free processors to process jobs in B and remaining jobs in V . These sets consist of a total of at most (|E|+k −h)+(|V |−k) = m−h jobs. Therefore we conclude that all jobs in V , A and B are completed at time 2.
If not all jobs in A are executed at t = 0, we observe that at time 0 we execute at least m − 1 jobs in (V A). Thus, at time 1 there are at most |V |−k+1 unprocessed jobs in V and A. It follows that at time 1 we can execute jobs in V , A and a subset of jobs in E that has size at most |E|. Therefore, using m − 1 processors, we can schedule (m − 1) − |E| = |V | − 1 jobs in (V A). We conclude that all jobs in A and V are completed at time t = 2 independently of the job ordering.
We now show that at time 5 all jobs in E , B and C are completed. First, assume that at t = 3 all nodes in B are processed. It follows that at t = 3 all predecessors of remaining unscheduled jobs in E and C are completed and that therefore the schedule of jobs in C and E is completed by time t = 5 for any FP-order. If at time t = 3 not all jobs in B are processed, we can schedule jobs in E and B but no jobs in C; it follows that at time t = 4 we complete the execution of jobs in E and B. Therefore at time t = 5 we complete the schedule by completing all jobs in C.
Theorem 6 and the reduction of Theorem 1 imply the following result.

C. Preemptive Scheduling
Theorem 2 states that LS-MAX is strongly NP-hard even if there are no precedence constraints. We reduce from LS-MAX without precedence constraints to the preemptive variant of CDAG-MAX in which each realization G J is scheduled using preemptive FP-scheduling based on ≺. We follow the reduction of Theorem 1. In the case without precedence constraints, no edge will be introduced in Step 2) of the reduction. Thus, the constructed conditional DAG G has no path between any pair of job copies v l j and v l j with v l j = v l j . As there are no paths between job copies and all other jobs have an execution time of zero by definition, it follows that all active job copies are available at time zero in any realization G J . Consider the preemptive variant of CDAG-MAX. The FP-schedule of G J will never use preemption, because no high priority job will ever become available and interrupt a low priority job. Thus, the preemptive and non-preemptive schedules are equivalent and we conclude the following result.

Theorem 8. CDAG-MAX is strongly coNP-complete even under preemptive FP-scheduling.
By using a similar proof to the one of Theorem 6, we can give the following inapproximability result for LS-MAX with unit-size jobs FP-scheduling.
Theorem 9. Approximating LS-MAX with a ratio less than 6/5 is NP-hard even with unit-size jobs.
The following theorem is implied by Theorem 9 and the reduction of Theorem 1. Note that the reduction introduces conditional nodes with execution times zero. The result for such unit-and zero-size jobs holds also for preemptive CDAG-MAX as preemption will not occur.
Theorem 10. Approximating CDAG-MAX with a ratio better than 6/5 is NP-hard, even (a) when p j = 0 for conditional jobs and p j = 1 otherwise, and (b) under preemptive FPscheduling.

D. Generalized Conditional DAG with Shared Nodes
In this section, we consider a more general variant of conditional DAGs allowing less nested structures and demonstrate a substantial increase in the complexity. While CDAG-MAX for conditional DAGs without shared nodes can be solved in polynomial time on a single processor [8], we show that CDAG-MAX for conditional DAGs with shared nodes is strongly NP-hard even on a single processor.
A conditional DAG with shared nodes G = (V, E, C) is defined analogous to Definition 1 with an adjusted third requirement for each (c 1 , c 2 ) ∈ C that allows edges from conditional branches G l to conditional branches G l of pairs (c 1 , c 2 ) ∈ C with (c 1 , c 2 ) = (c 1 , c 2 ).
When executing a conditional DAG with shared nodes, the execution of branches is defined as before and we say that a job j ∈ V is executed if one of the following conditions holds: • j is not part of any conditional branch. That is, j ∈ V l for each branch G l of any conditional pair (c 1 , c 2 ), • at least one of the innermost conditional branches G l with j ∈ V l is being executed. Realizations, FP-schedules and makespans are then defined as for conditional DAGs without shared nodes. Observe that the makespan C J of a realization G J on a single processor is just the sum of the execution times of all active jobs in G J . Therefore, the used FP-order has no influence on the makespan and we do not consider it subsequently.
We show a reduction from the strongly NP-hard problem 1in3-SAT [19]. In 1in3-SAT, there is given a set of propositional logic 3-Clauses C = {C 1 , . . . , C n } and a set of variables L = {λ 1 , . . . , λ n } such that each clause contains only positive literals and each variable occurs in exactly three clauses. The question is whether there is a satisfying variable assignment that satisfies exactly one λ ij for each Theorem 11. CDAG-MAX for conditional DAGs with shared nodes is strongly coNP-complete even on a single processor.
Proof. Let (C, L) be a given 1in3-SAT instance. We construct a conditional DAG G = (V, E, C) with execution times p j for all j ∈ V and m = 1 as follows (see also Figure 3): 1) For each clause C j ∈ C, add a node γ j with an execution time of one. 2) For each λ i ∈ L, add two nodes c i 1 and c i 2 with execution times of zero that form a conditional pair c i = (c i 1 , c i 2 ). a) For each branch l ∈ {1, 2} of c i , add a source s i l and sink t i l with execution times of zero. b) For each node γ j with λ i ∈ C j , add edges from s i 1 to γ j and from γ j to t i 1 . c) Add vertices π i and ρ i with execution times of one and edges from s i 2 to π i and ρ i and from π i and ρ i to t i 2 . Fig. 3. Reduction of Theorem 11 for variables λ i , λ j that share a clause C l .
Obviously the reduction can be done in polynomial time. To prove correctness and completeness, we show the following two statements: 1) If the given 1in3-SAT instance has a feasible solution, M (G, ≺) ≥ 7n 3 holds. 2) If the given 1in3-SAT instance does not have a feasible solution, M (G, ≺) < 7n 3 holds.
For each variable assignment α : L → {0, 1} we can construct a realization G J such that for each 1 and G i 2 are the two conditional branches of c i . In this way, a unique realization is constructed for each different α. As additionally the number of possible variable assignments and realizations is equal at 2 n , we can conclude that there is an one-to-one correspondence between variable assignments and realizations of the constructed conditional DAG.
To prove the first statement, assume that a 1in3-SATinstance with a feasible solution is given. Then, a satisfying variable assignment α : L → {0, 1} exists such that α satisfies each clause by exactly one literal. Let G J be the realization constructed for α as described above. We show that C J ≥ 7n 3 holds for realization G J . Then, M (G, ≺) ≥ 7n 3 follows. For each λ i ∈ L the branch G i 1 is part of G J if and only if α(λ i ) = 1 by definition of G J . In the resulting schedule, each γ j must be executed as α satisfies all clauses and thus for each clause C j there is a literal λ i ∈ C j with α(λ i ) = 1 and thus G i 1 ⊆ G J . Therefore γ j ∈ V i 1 is active and executed in G J . This contributes n time units to the makespan C J .
Because α satisfies each clause by exactly one literal and each variable occurs only as a positive literal in exactly three clauses, it follows by pigeon hole principle that at least 2n 3 variables λ i ∈ L exist with α(λ i ) = 0 and thus G i 2 ⊆ G J . Therefore, the vertices π i and ρ i are active for each such λ i . This contributes 4n 3 time units to C J . Adding up both parts leads to an makespan of at least 4n 3 + n = 7n 3 . To prove the second statement, assume that the formula is not satisfiable and consider an arbitrary variable assignment α. Let k be the number of variables λ i with α(λ i ) = 0 and let G J be the realization corresponding to α as described above.
If k < 2n 3 , then at most n nodes γ j are active for G J . For k variables λ i , it holds that π i and ρ i are active for G J . Therefore the makespan of G J is at most n + 2k < n + 4n 3 = 7n 3 . If k > 2n 3 , then at most 3 · (n − k) nodes γ j are active for G J as each variable occurs in at most 3 clauses. Additionally, for k variables λ i it again holds that π i and ρ i are active for G J . Therefore the makespan of G J is at most 2k +3(n−k) < 2 2n 3 + 3(n − 2n 3 ) = 7n 3 . To finish the proof, consider k = 2n 3 . As α does not satisfy the given formula by assumption, it follows that of the n − k positive assigned variables at least two must occur in the same clause. This means that at least one node γ j is not active for G J . Therefore the makespan G J is strictly less than n + 2k = n + 4n 3 = 7n 3 . It follows that C J < 7n 3 holds for each realization G J that corresponds to an assignment α. Because each realization corresponds to an assignment α, C J < 7n 3 holds for each realization G J and therefore M (G, ≺) < 7n 3 follows. IV. PSEUDO-POLYNOMIAL TIME ALGORITHM FOR BOUNDED WIDTH In this section, we consider conditional DAGs G with the property that each realization G J represents a partial order of width bounded by a constant k. The width of a partial order is the maximum number of pairwise incomparable tasks, that is, the maximum antichain. Slightly abusing notation, we say that the underlying graph has width bounded by k.
Theorem 12. CDAG-MAX can be solved exactly in pseudopolynomial time if each realization G J of the given conditional DAG G has width at most k.
Assume w.l.o.g. that G has a single source; otherwise we simply add a dummy terminal with execution time zero.
We present a dynamic program (DP) for solving CDAG-MAX. Each state of the DP describes a partial schedule in terms of an ideal as defined in [20]. An ideal I of a realization G J is a subset of V J such that a job in I implies all of its predecessors to be contained in I as well. We say that I is an ideal of G if I is an ideal of some realization G J . Every partial schedule for a set of jobs in G J must contain all jobs in the corresponding ideal of G to ensure that precedence constraints and conditions are respected. Our DP establishes the reachability in a graph of ideals, where an idealĪ is reachable from I if a feasible subschedule for I can be extended to a feasible schedule forĪ by adding tasks in I \ I while respecting the FP-order.
An ideal I can be represented in terms of its front tasks I ⊆ I, which are all jobs j ∈ I without successors in I. According to Dilworths Decomposition Theorem [21] an ideal I of a graph G with a width bounded by k can have at most k front tasks. Thus, the number of different ideals is bounded by n k . A state of our DP is a tuple (I, P ) with • I ⊆ V is the set of front tasks of an ideal for some realization G J of G such that there is a point in time t in the FP-schedule S J where all jobs in I are either being processed or available to being processed. • For each j ∈ I, P j ∈ N ∪ {−} either denotes the remaining execution time necessary to complete j or indicates that j has not been started yet (P j = −). We define a weighted, directed and acyclic state graph H = (U, F, w) with one source and one sink such that U contains the states of the DP and F contains all feasible state transitions. We define H such that the length of the longest source-sink-path in H corresponds to the worst-case makespan of G. To construct H, consider the initial state u 0 = ({s}, −) where s is the source of G. The state u 0 is part of H and we inductively define the rest of H.
Consider a state u = (I, P ) ∈ U . We define B as the set of jobs that will be started next according to the FP-order. Let m u denote the number of jobs that are being processed in state u (jobs j ∈ I with P j = −). Then m = m − m u is the number of free processors in u and B is the set of the (up-to) m jobs j ∈ I with the highest priority and P j = −. If B contains jobs with execution times zero, define B to only contain such jobs. This differentiation is necessary, as the start of jobs with execution times zero might cause other jobs to become available and thus change the set of the m available jobs with the highest priority. Then, p r = min({p j | j ∈ B} ∪ {P j | j ∈ I ∧ P j = −}) is the time that passes until the next job finishes and C = {j ∈ B | p j = p r } ∪ {j ∈ I | P j = p r } contains the jobs that finish next.
We define all states u = (I , P ) and transitions f = (u, u ) with w(f ) = p r to be part of H if the following holds • I = (I \ C) ∪ S contains all jobs j ∈ I that have not been completed (j ∈ C) and the set of jobs S that become available. Therefore S contains exactly one successor for each j ∈ C that is the start of a conditional pair and all successors of other jobs in C for which all predecessors have been finished. • P j remains the same for all jobs that were not being processed in u and have not been started (P j = − for all j ∈ I \ B with P j = −). The jobs that become available are not being processed, P j = − for all j ∈ S. For all jobs that have been processed or started in u but have not finished, the remaining execution time is decreased by p r . That is, P j = P j − p r for all j ∈ I with P j > p r and P j = p j − p r for all j ∈ B with p j > p r . Note that a state u can have multiple successors in H, as multiple job sets S can become available due to conditional constructs. Additionally observe that we can decide whether a predecessor j of a completed job j ∈ C has been finished. A predecessor j has finished if either j ∈ C holds or if neither j nor predecessors of j are elements of I \ C.
In summary, the state graph H contains the start state u 0 , all states and transitions that can be reached from u 0 , and the state d = (∅, −) as the sink of H. A straightforward induction on the construction implies the following. By Lemma 3 we can compute the worst-case makespan M (G, ≺) for a given cp-task as follows: we construct the state graph H and find a longest path from the start state u 0 to the end state d. The corresponding worst-case realization G J and the corresponding FP-schedule can be found by backtracking.
Observe that each state is of polynomial size and the successor states of a given state u can be computed in polynomial time. It remains to show that the number of states in H is pseudo-polynomial in the input size.
As argued before, width k of G implies that there are O(n k ) different sets for I. Moreover, for each I there are at most p max possible elements for P , where p max = max j∈V p j . Thus, there are O(n k ·p k max ) states, which is pseudo-polynomial as k is constant. Thus, a dynamic program can compute the longest u 0 -d-path for H in O(n 2k · p 2k max ) and, by Lemma 3, solves CDAG-MAX exactly. This proves Theorem 12.

V. FPTAS UNDER MONOTONICITY
We present a fully polynomial-time approximation scheme (FPTAS) for CDAG-MAX for a certain class of monotone conditional DAGs, which we define below.
Definition 4. A family of algorithms {A ε } is called FPTAS if, for every input I and every ε > 0, algorithm A ε finds a solution of value within a factor of 1 + ε (resp. 1 − ε) of the optimal solution for I and the running time of A ε is polynomial in the encoding of I and 1/ε.
Roughly speaking, a scheduling algorithm is monotone if increasing the execution times does not decrease the makespan.

Definition 5.
A scheduling algorithm is monotone, if for any pair of scheduling instances I and I , that differ only in one job j with p j < p j , the respective makespans C I and C I for each instance satisfy the following: (a) C I ≤ C I , and (b) In general, FP-scheduling for conditional DAGs is not monotone, even if each realization is of bounded width, see the Graham anomalies [6].
We consider graphs G that have in each realization G J bounded width and FP-schedules on G J that are monotone. Conditional DAGs that consist of a constant number of chains in each realization belong to this class. The proof is omitted.
Theorem 13. Let I be a scheduling instance such that the precedence constraint graph G is a set of disjoint chains, then each FP-schedule of I is monotone.
We now design an FPTAS for CDAG-MAX. Let I = (G, p, m, ≺) be an instance with G = (V, E, C) and |V | = n. For fixed ε > 0, set μ = ε · p max /n where p max = max j∈V p j is the maximum execution time in I. We define algorithm A ε : 1) LetÎ = (G,p, m, ≺) be a rounded CDAG-MAX instance withp j = pj μ for each j ∈ V . 2) Solve the rounded instanceÎ using the dynamic program (DP) from Section IV. Let GĴ be the realization that corresponds to the longest u 0 -d-path determined by DP. 3) Return μ ·ĈĴ whereĈĴ is the completion time of GĴ on the rounded instanceÎ. This definition follows the rather standard method of rounding the values to reduce the state space and thus the running time of the pseudo-polynomial time DP. It remains to prove that the solution quality does not deteriorate too much.
Theorem 14. Algorithm {A ε } is an FPTAS for CDAG-MAX for conditional DAGs such that each realization has a width bounded by a constant and FP-schedules are monotone.
Proof. We first show that the runtime of each A ε is polynomial in the input size and 1 ε . The runtime of A ε is dominated by the time that the DP takes for solving the rounded instanceÎ, which is O(n 2k ·p 2k max ). By definition ofÎ, we have: p max = p max /μ = p max · n/(εp max ) ≤ n/ε . Thus, the overall runtime is O(n 4k / 2k ), which is polynomial in the input size and 1 ε . It remains to show that algorithm A ε computes for any given instance I a solution of value A ε (I) that satisfies M (G, ≺) ≤ A ε (I) ≤ (1 + ε) · M (G, ≺).
Consider an instance I, ε > 0 and A ε . Let J * be a realization with C J * = M (G, ≺) on instance I and letĴ be the realization computed by A ε , i.e., by the DP on the rounded instanceÎ in Step 2). For a realization J ∈ J let C J andĈ J denote the makespans of J in instance I and its rounded variantÎ, respectively. The value for CDAG-MAX computed by A ε is A ε (I) = μ ·ĈĴ .
First, we observe that for every realization J ∈ J holds: This observation is crucial and relies on the monotonicity property. In general, Inequality (1) may not be true for FPscheduling [6]. To see (1), consider an instance I obtained from I by scaling all execution times down by the factor μ. For any realization G J , the makespan C J of the original instance I and the makespan C J for the scaled instance I satisfy C J = μ · C J . As G J is monotone by assumption and as execution times inÎ equal those in I rounded up to the next integer values, the makespan of the FP-schedule ofÎ satisfies C J ≤Ĉ J (cf. Definition 5(a)), which implies (1). Algorithm A ε computes in Step 2) an exact solution for the rounded instanceÎ, i.e.,ĈĴ is the worst-case makespan forÎ which is not less than the makespanĈ J * for J * . Using Inequality (1) we conclude It remains to show that A ε (I) ≤ (1+ε)·M (G, ≺). Similar to the arguments above, we can show the following inequality by repeatedly applying monotonicity property (b) in Definition 5. For every realization J ∈ J holds: Combining (2) with the definition of μ, we conclude A ε (I) = μ ·ĈĴ < CĴ + n · μ = CĴ + ε · p max ≤ (1 + ε) · M (G, ≺), which completes the proof.
As mentioned above, we can show that conditional DAGs that consist of a constant number of chains in each realization satisfy the monotonicity property of Theorem 13. With Theorem 14 this implies the existence of an FPTAS which is best possible given the weak coNP-hardness (Theorem 5). Corollary 1. Algorithm {A ε } is an FPTAS for CDAG-MAX for conditional DAGs G such that each realization of G is a constant number of disjoint chains.

VI. CONCLUSION
In this work, we resolve the complexity status of the CDAG-MAX problem in conditional DAGs. We obtained refined results depending on the graph structure. While we were able to show that CDAG-MAX is (weakly) coNP-hard for m = 2 processors if each realization of the given conditional DAG consists of k = 4 chains, the complexity for m = 2 and k = 3 remains open. As LS-MAX for three chains on two processors is in P, we cannot hope to settle this open question by using our reduction framework.
Our reduction from LS-MAX to CDAG-MAX relies on the ability to arbitrarily assign priorities. In particular, the reduction does not work if we require all jobs of the same conditional construct to have the same priority. We leave the complexity of this special case open. It seems to be an interesting and relevant case as it is a common approach in practice to assign priorities on a thread level in multi-threading scenarios.