Reinforcement Learning Based Latency Minimization in Secure NOMA-MEC Systems With Hybrid SIC

In this paper, physical layer security (PLS) in a non-orthogonal multiple access (NOMA)-based mobile edge computing (MEC) system is investigated, where hybrid successive interference cancellation (SIC) decoding is considered. Specifically, users intend to complete confidential tasks with the help of the MEC server, while an eavesdropper attempts to intercept the offloaded tasks. By jointly designing computational resource allocation, task assignment, and power allocation, a latency minimization problem is formulated. Based on the interactions between local computing time and MEC processing time, the closed-from solutions of computational resource allocation and task assignment are derived. After that, a strategy selection mechanism is established to select offloading strategies based on the corresponding conditions. Moreover, according to the analysis of hybrid SIC decoding, the conditions of different decoding orders in secure NOMA networks are derived. Furthermore, a reinforcement learning based algorithm is proposed to solve the power allocation problems for NOMA and OMA offloading strategies. This work is extended to a multi-user scenario, in which a matching-based algorithm is proposed to solve the formulated sub-channel assignment problem. Simulation results indicate that: i) the proposed solution can significantly reduce the latency and provide dynamic strategy selection for various scenarios; ii) the NOMA offloading strategy with hybrid SIC decoding can outperform other strategies in the considered system.

In this context, mobile edge computing (MEC) has been emerged as a promising technology to provide the real-time computational service [2]. Specifically, by equipping the high-performance central processing units (CPUs) at the base station (BS), the computation-intensive and latency-critical tasks of the mobile users can be partially or fully offloaded to the MEC server for processing [3]. As a result, the sophisticated applications are executable at the smart device with the limited computational capacity and power [4]. Moreover, in order to achieve the high-speed and low-latency task offloading, the non-orthogonal multiple access (NOMA) transmission scheme is extensively employed in MEC systems [5], [6]. With NOMA schemes, multiple users are able to offload tasks to the MEC server at the same time and frequency band [7]. By adopting successive interference cancellation (SIC) techniques, part of co-channel interference can be decoded and removed at the BS based on the channel state information (CSI) or quality of service (QoS) [8], [9]. In previous research, the advantages of adopting NOMA schemes in MEC systems have been demonstrated, and compared with conventional orthogonal multiple access (OMA) schemes [10], [11].

A. Related Works
In existing works on NOMA-MEC systems, user clustering and resource allocation have been extensively studied in order to minimize the latency, energy consumption, or their combination [12]- [18]. However, due to the broadcast nature of the multiple access networks, the offloaded tasks have a high risk of being intercepted by external eavesdroppers [19]. Therefore, the research on physical layer security (PLS) in NOMA-based MEC systems has emerged recently [20]- [26]. In [20], by introducing an energy weight for each user, a weighted energy consumption minimization problem was investigated in a two-user NOMA-MEC system. In order to further improve the aforementioned system, a secrecy outage probability minimization problem was considered. The energy consumption minimization problem was also studied in [21], where a group of wireless devices was considered as the jammer to help the eavesdropped edge computing device. For efficiently solving the formulated problem, a three-layered algorithm was proposed with the help of vertical decomposition. The authors in [22] focused on maximizing the minimum anti-eavesdropping ability of the proposed NOMA-MEC system. It was indicated that users tend to compute all tasks locally, and the offloading strategy is selected only if the energy for computing is insufficient. By jointly designing task assignment and power allocation, a latency minimization problem was studied in [23], where a bisection searching based algorithm was developed to solve the simplified problem. A novel multi-user MEC system with one unmanned aerial vehicle (UAV) server and one UAV jammer was proposed in [24], where both time division multiple access (TDMA) and NOMA transmission schemes were utilized. By maximizing the minimum secure computing capacity in those two schemes, the superiority of NOMA schemes compared to TDMA schemes was proved. By defining the energy efficiency as the ratio of the sum secure rate to the sum power consumption, an energy efficiency maximization problem with imperfect and perfect CSI was studied in [25] and [26], respectively. Specifically, in order to mitigate the impact of the eavesdropper, in [25], a full-duplex MEC server was introduced in a multi-subcarrier scenario to generate artificial noise, and hence, sub-channel allocation and power control for the MEC server were taken into consideration. In [26], the formulated problem was divided into two subproblems, and the closed-form solutions of local CPU-cycle frequency scheduling and the transmit power allocation were provided.

B. Motivation and Contribution
Although the optimization for NOMA-based MEC systems in PLS scenario has been extensively investigated, the in-depth research and analysis in this field is still limited. First, since the size of offloaded tasks is optimized, it may lead to different strategies to complete the given tasks. For example, some users may not be able, or need, to offload tasks to the MEC server. In this case, it is necessary to establish a dynamic strategy selection mechanism based on task assignment and power allocation. Second, in secure NOMA-MEC systems, the signals are decoded at both the BS and the eavesdropper, and hence, the SIC decoding order plays an important role in determining system performance. In existing research, the SIC decoding order is considered to be fixed [21], [24]- [26], or a strong assumption that eavesdroppers can cancel all interference is adopted [20], [22], [23]. For the sake of efficiency and practicality, at the BS and the eavesdropper, a more flexible SIC decoding order and an uncontrollable SIC decoding order should be employed, respectively. Third, even though the computational capacity of the MEC server is obviously larger than that of the user, it is still limited, and then the MEC computing time should be taken into consideration. For the multi-user scenario, computational resource allocation for processing offloaded tasks is required.
Against this background, this paper aims to find the possible task offloading strategies and derive the corresponding conditions. Moreover, it is considered that the SIC decoding order can be dynamically switched at the BS, and randomly selected at the eavesdropper. At the MEC server, computational resource allocation is included to further improve the proposed system. The main contributions of this paper are listed as follows: • A novel secure NOMA-MEC system is proposed, where an eavesdropper intends to overhear the offloaded tasks. The users can partially offload tasks to the MEC server via the NOMA or OMA schemes, or completely compute all confidential tasks locally. At the BS and the eavesdropper, the hybrid and random SIC decoding orders are utilized, respectively. A latency minimization problem is formulated by jointly designing power allocation, task assignment, and computational resource allocation. • The computational resource allocation and task assignment problems are solved by investigating the interactions between local computing time and MEC processing time, where the closed-form solutions are derived and expressed in terms of the secrecy rate. The offloading strategies and the corresponding conditions for switching among those strategies are derived to improve the performance of task processing. Furthermore, the conditions for achieving different SIC decoding orders are analyzed. • The formulated power allocation problem is divided into three sub-problems according to the possible strategies, including NOMA offloading, OMA offloading, and local computing. With the help of reinforcement learning, a deep deterministic policy gradient (DDPG)-based algorithm is developed, which can optimize power allocation coefficients for the NOMA and OMA offloading strategies. The complexity of the proposed DDPG-based algorithm is also presented. • In order to extend the work to a multi-user scenario, sub-channel assignment is studied, where users are paired and assigned to different sub-channels. The formulated sub-channel assignment problem is considered as a two-to-one matching, and then a matching-based algorithm is proposed, in which the DDPG-based power allocation algorithm can be iteratively performed. The property of the matching-based algorithm is also analyzed. The effectiveness of the proposed solution is verified by simulations, where the appropriate strategies are selected based on various situations. Moreover, it is demonstrated that for both two-user and multi-user scenarios, hybrid SIC decoding can outperform fixed SIC decoding in the considered NOMA-MEC system. Furthermore, the correctness of the provided insights is verified.
II. SYSTEM MODEL Consider a MEC system with one BS, one eavesdropper and two users, where the MEC server is equipped at the BS, and all nodes are equipped with single-antennas. In order to complete the confidential tasks, users tend to partially offload tasks to the trusted MEC server by utilizing NOMA transmission schemes, and compute the rest of the tasks locally in the meantime. At the MEC server, the available computational resource, i.e., CPUs cycles, is dynamically allocated to process the offloaded tasks. The notations used in this paper are listed in Table I.

A. Signal Model
Based on the uplink NOMA scheme, users occupy the same time and frequency to offload tasks to the MEC server. At the BS and the eavesdropper, the received signals can be respectively expressed as and It is assumed that perfect channel state information (CSI) of all nodes is available at the BS, and the channel gain is constant during offloading. In order to decode the received signals, the SIC technique is utilized at the eavesdropper and the BS [27]. At the eavesdropper, the eavesdropping process can be considered as an uplink NOMA system, whose sum rate is not affected by SIC decoding orders. Therefore, in this paper, an arbitrary decoding order is adopted at the eavesdropper, since it does not change the size of intercepted tasks. The denotation of users is based on the SIC decoding order at the eavesdropper. Without loss of generality, the user whose signal is decoded firstly is denoted by user 1, and the user whose signal is decoded afterwards is denoted by user 2. Therefore, at the eavesdropper, the data rate of user 1 and user 2 can be respectively expressed as follows: and where |h i,e | 2 = P t |ĥ i,e | 2 σ −2 is the normalized channel gain. At the BS, the hybrid SIC decoding is employed, where the decoding orders can be further optimized in order to improve users' individual performance [28], [29]. Due to the fact that there are two users in the proposed system, two different decoding orders should be considered, as shown in follows. 1) SIC Decoding Order 1: If user 1's signal is decoded first at the BS, the data rate of user 1 is given by where |h i | 2 = P t |ĥ i | 2 σ −2 is the normalized channel gain. After removing the interference caused by user 1, the data rate of user 2 can be presented as follows: 2) SIC Decoding Order 2: If user 2's signal is decoded before user 1's, the following data rate can be respectively obtained at the BS: and Note that the individual secrecy rate of any user should be non-negative, otherwise this user would stop offloading tasks to the MEC server. Therefore, the secrecy rate of any user i can be expressed as If the secrecy rate of any user is zero, the local computing strategy is employed to process all tasks. In this case, the other user is allowed to offload tasks to the MEC server by utilizing OMA transmission schemes, where the data rate of the OMA user can be calculated by removing the interference caused by the local computing user.

B. MEC Model
Suppose that β i D i bits of tasks are offloaded from user i to the MEC server. The required offloading time is given by In this paper, the pure NOMA transmission scheme is employed, which implies that the offloading time of all users should be the same. 1 That is, if both users decide to offload tasks to the MEC server, the following condition should be satisfied: This condition can be achieved by adjusting task assignment coefficients β i and power allocation coefficients p i . On the other hand, if any user decides to compute all tasks locally, the 1 It is revealed in [30] that with sufficient energy, the lower latency can be achieved by the pure NOMA scheme, compared to the hybrid NOMA scheme. In this paper, in order to minimize the latency, the pure NOMA scheme is performed, which can be implemented based on power allocation and task assignment.
offloading time of this user is zero. Therefore, the offloading time of any user i can be presented as follows: At the MEC server, the computational resource is allocated to process the offloaded tasks. For user i's offloaded tasks, the computing time is given by After offloading β i D i bits of tasks to the MEC server, the remaining tasks, i.e., (1 − β i )D i , are processed by the user. The local computing time of user i is given by III. PROBLEM FORMULATION In order to explore the security issue and improve the performance, latency minimization is investigated in the proposed NOMA-MEC system. Due to the fact that users can compute and offload tasks simultaneously, while the MEC server can only compute tasks after offloading, the latency of any user i can be expressed as By jointly designing power allocation, task assignment, and computational resource allocation, the latency minimization problem is formulated as follows: where p, β and τ are the collections of all power allocation coefficients, task assignment coefficients, and computational resource allocation coefficients, respectively. In the formulated problem, the objective function is the maximum time consumption of both users, including the local computing time T com i and the MEC processing time T off i + T com 0,i . In constraints (16a) and (16b), the ranges of users' power allocation coefficients and task assignment coefficients are defined. Constraints (16c) and (16d) state the condition of the allocated computational resource at the MEC server for computing each user's offloaded tasks. Constraint (16e) indicates that the offloading time of any user is zero with the local computing strategy, or equals to another user's offloading time with the NOMA offloading strategy.
The formulated problem is non-convex and difficult to be transformed into a convex problem. However, this problem can be solved by analyzing the interactions between all terms in the objective function. Specifically, there is a trade-off between users' MEC computing time, i.e, T com 0,1 and T com 0,2 , which is decided by the computational resource allocation coefficients. Moreover, the balance between each user's local computing time T com i and MEC processing time T off i + T com 0,i is determined by the task assignment coefficient. Based on different strategies, including NOMA offloading, OMA offloading and local computing, the formulated problem can be divided into three sub-problems and solved separately.

IV. COMPUTATIONAL RESOURCE ALLOCATION
AND TASK ASSIGNMENT In this section, by analyzing the aforementioned interactions, the closed-form expressions of computational resource allocation coefficients and task assignment coefficients are derived, and the formulated problem is transformed according to different strategies.

A. Optimal Computational Resource Allocation Coefficients
In the formulated latency minimization problem, the computational resource allocation coefficients only involve the MEC computing time. That is, if power allocation coefficients and task assignment coefficients are fixed, the computational resource allocation strategy can be obtained by balancing the MEC computing time of both users.
If both users offload tasks to the MEC server by utilizing NOMA schemes, i.e., β i > 0 and R i,s > 0, ∀i ∈ {1, 2}, the users' offloading time is the same and equals to T off , as shown in (11). In this case, the original objective function (16) can be transformed as With the given secrecy rate and task assignment coefficients, the local computing time T com i and offloading time T off can be regarded as constants and removed. 2 Based on (13), the following problem can be obtained: That is, by minimizing the MEC computing time, the optimal computational resource allocation coefficients can be obtained. It is indicated by (18) that any user's MEC computing time is monotonically decreasing with the increasing computational resource allocation coefficient. Therefore, in problem (18), all available computational resources at the MEC server should be utilized, i.e., τ 1 + τ 2 = 1. On the other hand, there is a trade-off between users' computing time at the MEC server. Specifically, the increase in any user's computational resource allocation coefficient will lead to the increase in another user's MEC computing time. According to [12], T com 0,1 = T com 0,2 is satisfied by the optimal computational resource allocation coefficients. As a result, the following condition can be obtained: 2 Note that if the local computing time of any user, i.e., T com i , is significantly greater than other terms, the computational resource allocation strategy will not affect the latency of the proposed system. However, in this case, a computational resource allocation strategy is still required.
Hence, the optimal computational resource allocation coefficients can be expressed as Note that the derived optimal computational resource coefficients always satisfy constraints (16c) and (16d) with any task assignment coefficients. Moreover, with the OMA offloading strategy in which any user does not offload tasks to the MEC server, i.e., β i = 0, ∃i ∈ {1, 2}, the derived optimal solution in (20) still holds. In this case, all available computational resources at the MEC server is allocated to compute the offloaded tasks of the OMA user. Furthermore, with the local computing strategy in which both users complete all tasks locally, there is no offloaded tasks at the MEC server, and hence, the computational resource allocation coefficients of both users are zero. Therefore, the optimal computational resource allocation coefficients with all strategies can be presented as follows:

B. Optimal Task Assignment Coefficients
In this subsection, with the derived optimal computational resource allocation coefficients, the optimal task assignment coefficients can be obtained according to the fixed secrecy rate. At this stage, the terms in the objective function (16) can be expressed as functions of task assignment coefficients, as shown below: where and are local computing time and MEC processing time (including the offloading time and MEC computing time), respectively. For any user i, there is a trade-off between f (β i ) and g(β i ), which is determined by task assignment coefficient β i . For example, if β i grows, local computing time f (β i ) is decreased, while MEC processing time g(β i ) is increased. Therefore, in order to minimize the latency, the following condition should be satisfied: if user i can offload tasks to the MEC server, i.e., R i,s > 0. Moreover, if the NOMA offloading strategy is adopted, i.e., R i,s > 0, ∀i ∈ {1, 2}, the following condition can be obtained from (11): As a result, with the fixed secrecy rate and the optimal task assignment coefficients, the optimal task assignment coefficients can be obtained. 3 Proposition 1: In the proposed NOMA-MEC system, if R i,s > 0, ∃i ∈ {1, 2}, the optimal task assignment coefficients of problem (16) can be expressed as Proof: Refer to Appendix A.
Due to the fact that the above proposition is derived based on (26), constrain (16e) always holds by the optimal task assignment coefficients. That is, with any given secrecy rate, the task assignment coefficients can be dynamically adjusted to satisfy constrain (16e). Moreover, it is worth to mention that in the local computing strategy, all tasks are computed by users, and hence, the optimal task assignment coefficients can be expressed as

C. Strategy Selection
At this stage, the closed-form solutions of task assignment and computational resource allocation coefficients are derived, and expressed by the secrecy rate. Based on whether the secrecy rate is zero, three power allocation problems can be obtained with different offloading strategies.
1) NOMA Offloading Strategy: In this case, both users offload tasks to the MEC server, i.e., R i,s > 0, ∀i ∈ {1, 2}. By substituting the derived task assignment coefficients (27) into (22), the power allocation problem can be presented as where In the NOMA offloading strategy, the offloading time of both users is the same. By including the derived task assignment coefficients (27) into (11), the condition of the optimal secrecy rate can be obtained. Remark 1: In the proposed NOMA-MEC system, if R i,s > 0, ∀i ∈ {1, 2}, the optimal secrecy rate satisfies the following condition: where Note that (34) is the condition of the theoretical optimal secrecy rate. If this equation is satisfied, the following condition can be achieved: and the global optimal solution is obtained, which can achieve the minimum latency of the proposed NOMA-MEC system. However, due to the existence of interference and different SIC decoding orders, (34) is difficult to be satisfied.
2) OMA Offloading Strategy: In this case, one of the users tends to compute all tasks locally since the positive secrecy rate cannot be achieved. However, this condition is considered at the offloading phase. As shown in footnotes 2 and 3, there exists another condition for adopting the OMA offloading strategy, which is considered at the computing phase.
Proposition 2: In the proposed NOMA-MEC system, the OMA offloading strategy will be selected if the following condition is satisfied: where users i and i are the OMA offloading user and local computing user, respectively. Proof: Refer to Appendix B.
The above proposition is due to the fact that the computing time to process all tasks at user i is less than the computing time to simultaneously process user i's tasks at user i and the MEC server, even though all available computational resources at the MEC server are allocated for user i's tasks. This situation will occur if the size of user i's tasks is significantly larger than that of user i 's, and/or the CPUs equipped at user i are more powerful compared to user i and the MEC server. In this case, the OMA offloading strategy is selected and the latency is determined by the OMA offloading user i. Based on these conditions, the OMA offloading strategy is adopted, and the following problem is formulated: where and In the OMA offloading strategy, the MEC processing time of the local computing user i is zero, i.e., g i = 0, and hence, it is removed from the objective function. Moreover, constraint (16e) is removed since it is always satisfied in this case.

3) Local Computing Strategy:
In this case, the positive secrecy rate cannot be achieved by both users, i.e., R i,s = 0, ∀i ∈ {1, 2}, and hence, the power allocation coefficients can be presented as follows: Therefore, the latency of the proposed NOMA-MEC system is decided by the local computing time of both users, as shown below:

D. Analysis of SIC Decoding Order
In this subsection, the NOMA offloading strategy is analyzed, where the different SIC decoding orders are investigated. From (10), it is indicated that the increasing secrecy rate can reduce the offloading time, or allows the users offload more tasks to the more efficient MEC server with the same offloading time. Therefore, in NOMA offloading strategy, under the condition in Remark 1, both users tend to maximize the secrecy rate. However, due to the interaction of users' power allocation coefficients, the optimal power allocation coefficients cannot be explicitly derived. In this in context, this subsection only compares SIC decoding orders.
1) SIC Decoding Order 1: If user 1's signal is decoded before user 2's, the secrecy rate can be presented as and With this decoding order, the condition for performing NOMA transmission schemes can be obtained as follows. Proposition 3: In the proposed NOMA-MEC system, the NOMA transmission scheme can be utilized with SIC Decoding Order 1 if the following condition is satisfied: Proof: Refer to Appendix C. It is described by (96) that the feasible region of user 2's power allocation coefficient is replaced by if |h 1 | 2 > |h 1,e | 2 and |h 1 | 2 |h 2,e | 2 < |h 2 | 2 |h 1,e | 2 hold. By eliminating normalization, the terms in (47) can be expressed as .
It can be found that the above term is monotonically decreasing with the increasing transmit power, and hence, the feasible region of user 2's power allocation coefficient p 2 , i.e., (47), is shrinking. The following insight can be obtained: Remark 2: In the proposed NOMA-MEC system, by adopting SIC Decoding Order 1, the upper bound of user 2's power allocation coefficient decreases with the transmit power if |h 1 | 2 |h 2,e | 2 < |h 2 | 2 |h 1,e | 2 holds.
2) SIC Decoding Order 2: If the SIC decoding order at the BS is swapped, the signal of user 2 is decoded firstly, and the expressions of secrecy rate can be shown as and In this case, the condition for adopting SIC Decoding Order 2 can be presented as follows. Proposition 4: In the proposed NOMA-MEC system, NOMA transmission schemes with SIC Decoding Order 2 can be employed if the following condition is satisfied: Proof: Refer to Appendix D. From (104), the feasible region of user 1's power allocation coefficient is shown as follows: For the unnormalized channel condition, the term in the above function is given by It is indicated that with the increasing transmit power, the feasible region of p 1 tends be small. The following conclusion can be drawn.

V. DDPG-BASED POWER ALLOCATION
In this section, power allocation with both NOMA and OMA offloading strategies is investigated. As aforementioned, the formulated power allocation problems (29) and (38) are non-convex and difficult to be solved. To tackle this issue, the DDPG scheme is adopted to minimize the latency, where the power allocation solution is obtained based on the DDPG decision-making strategy. The DDPG framework and the training algorithm are discussed in the sequel.
A reinforcement learning problem can be described by a 4-tuple (s t , a t , r t , s t+1 ) at any time step t, where s t denotes the state, a t denotes the action, r t denotes the immediate reward of action a t in state s t , and s t+1 denotes the state at the next step [31]. Specifically, in the proposed DDPG-based algorithm, the following elements are defined: • State Space S: The state space S is the collection of all states, i.e., s t ∈ S, ∀t, where any state at step t can be expressed as Note that all channel conditions are available at the BS, and the channel gains are different among time steps. • Action Space A: The action space A is the collection of all actions, where any action at step t contains the corresponding power allocation coefficients, i.e., It is worth mentioning that the designed DDPG-based algorithm is capable of outputting continuous actions, and hence, the action space A is also continuous. • Reward Function: After choosing action a t for any given state s t , the immediate reward at step t is defined as follows: The term −1 is included in the reward function to minimize the objective function of (29). That is, if the DDPG network takes an action that increases the reward, the latency will be decreased. Moreover, the reward of the previous round is obtained at the beginning of any step. The DDPG network is designed to find an optimal strategy which can maximize the discounted long term reward R t , defined as where γ ∈ [0, 1] is the discount factor which determines the balance of current and future rewards. If γ is small, the network will focus on maximizing the current reward. When γ increases, the network tends to choose the action which can maximize the future reward. In this paper, the actions between steps are independent because the power allocation constraint is only set for each step. Therefore, a relatively small γ is chosen in order to focus on the current reward. As shown in Fig. 1, an experience memory is included in the designed DDPG network in order to avoid inefficient learning caused by the highly correlated input data. Moreover, the DDPG network adopts the actor-critic architecture, which contains an actor network and a critic network. Meanwhile, the DDPG network includes an additional neural network for both the actor network and the critic network, namely target networks.
For actor-critic schemes, the actor network takes the state s t as the input, and then outputs the instant action a t to the MEC-NOMA network based on the weight ω and a stochastic noise, i.e., where N 0 is the exploration noise, which can balance the exploration of new actions and the exploitation of previous actions. As a result, the case that the neural network stuck on local optimal decisions can be prevented. It is worth pointing out that each term in a t is limited to [0, 1], and hence, the value of the noise-added action will be clipped if the result is beyond the desired range. Moreover, the decision of the actor network, i.e., π(s t ; ω), is also outputted to the critic network for evaluation. The critic network receives the decision, and then outputs the estimated Q-value Q(s t , π(s t ; ω); θ) with the weight θ, where the Q-value describes the expected long term reward. Based on the principle of policy gradient theorem [32], the objective of the actor network is to maximize the long term discounted reward J(μ) = Q(s, π(s; ω); θ), and hence, the actor network updates the weight ω according to the gradient: ∇ ω J(ω) = ∇ ω π(s; ω)∇ a Q(s, a; θ).
In terms of the target network, it is essentially a duplication of the original network with a slower update frequency. As a result, more stable labels can be provided for the actor and critic networks. In the target networks, the soft update policy is adopted with the following weights: where 0 < ρ 1 is the updating parameter. Moreover, in the target critic network, the target Q-value is estimated based on the experience tuple (s i , a i , r i , s i+1 ), as shown below: where i is the random sample index. Hence, the loss function of the critic network can be written as follows: Here, the critic network is trained by minimizing the loss function. According to the above settings, a DDPG-based power allocation algorithm is presented in Algorithm 1.
In terms of the DDPG framework, each neural network contains one hidden layer with 100 neurons. Moreover, both the critic network and the target critic network adopt the Rectified Linear Unit (ReLU) as the activation function, while Input state s t into actor network and obtain action a t = π(s t ; ω) + N .

9:
Observe r t ← −1 × max{f 1 (t), f 2 (t), g 1 (t), g 2 (t)}, and the next state s t+1 . 10: Store experience tuple (s t , a t , r t , s t+1 ) into the memory R. 11: if memory counter > |R| then 12: Remove previous experiences from the beginning. 13: end if 14: Randomly sample a mini-batch of experience tuple (s t , a t , r t , s t+1 ) with batch size and input DNNs. 15: Update the weights of actor and critic networks based on (59) and (63). 16: Update target network weights ω − and θ − according to (60) and (61). 17: end for 18: end for the activation function for the actor network is Sigmoid, which output the action value a t ∈ [0, 1] [33]. Based on the proposed DDPG-based power allocation algorithm, the formulated problems in (29) and (38) can be solved, where the strategies can be dynamically switched according to different conditions. In order to implement the hybrid SIC decoding order, the performance of different decoding orders is compared in the NOMA offloading strategy, and the best decoding order which can achieve the minimum latency is selected.
According to [34], [35], the computational complexity of the proposed DDPG-based power allocation algorithm can be expressed as where N mb is the size of the mini-batch, L is the number of layers, and N i is the number of neurons in the i-th layer. Specifically, for all 4 neural networks, the computational complexity of each step is 4N mb Due to the fact that there are N ep episodes and each episode includes N ts steps, the total computational complexity is given by

VI. MATCHING-BASED SUB-CHANNEL ASSIGNMENT
In order to extend the work to a multi-user scenario, a subchannel assignment problem is considered in this section. Specifically, N users are paired and assigned to K subchannels, where N = 2K. The collections of users and sub-channels are denoted by N = {1, 2, . . . , N} and K = {1, 2, . . . , K}, respectively. For the multi-user situation, the subscripts of variables are changed accordingly.

A. Sub-Channel Assignment Problem Formulation
By considering the maximum value of all users' time consumptions as the latency of the system, the formulated sub-channel assignment problem is given by where X is a matrix of all sub-channel assignment indicators x k,n . Constraint (64a) indicates two possible situation of each user, where x k,n = 1 means user n is assigned to sub-channel k; otherwise x k,n = 0. Constraints (64b) and (64c) show that each sub-channel is occupied by two users, and each user is assigned to one sub-channel, respectively.

B. Matching-Based Sub-Channel Assignment Algorithm
Due to the fact that there exists a binary constraint, matching is utilized to solve the formulated sub-channel assignment problem. By treating users and sub-channels as two disjoint sets of players, a two-to-one matching can be defined as follows: Definition 1: Given two disjoint sets N and K, a two-toone matching ψ denotes the mapping from N to K, which satisfies It is assumed that all players are selfish. Therefore, the case that any user n is willing to be assigned to sub-channel k rather than k only depends on its utility, as follows: where user n's utility with sub-channel k is given by Moreover, the utility of any sub-channel k, i.e., U k (ψ), is also decided by the above function.
In the proposed matching-based algorithm, if any user intends to use any sub-channel, it needs to exchange with one of the users assigned to that sub-channel. The swap matching ψ m n = {ψ\{(k, m), (k , n)} ∪ {(k, n), (k , m)}}, m ∈ ψ(k), n ∈ ψ(k ), m ∈ ψ m n (k ), and n ∈ ψ m n (k) denotes the sub-channels of users m and n are exchanged, and the matching is transformed from ψ to ψ m n . This exchange operation indicates that (m, n) is a swap-blocking pair, which is defined as follows: Definition 2: A swap-blocking pair (m, n) can be confirmed if and only if the following conditions are satisfied . Based on the definition of swap-blocking pairs, a matchingbased sub-channel assignment algorithm is proposed in Algorithm 2. In the algorithm, the DDPG-based power allocation algorithm is implemented at each iteration in order to calculate the utility and find the swap-blocking pair. That is, in the multi-user case, Algorithm 1 and Algorithm 2 are iteratively performed to minimize the latency.

Algorithm 2 Sub-Channel Assignment Algorithm
Step 1: Initialization phase 1) Randomly match all users and sub-channels.
2) Convergence: It is indicated by the definition of swap-blocking pairs that at least one pair can achieve less utility (latency) during the algorithm, and none of the pairs can increase the utility. With a finite number of users and sub-channels, the number of swap-blocking pairs is limited. As a result, the proposed sub-channel assignment algorithm is guaranteed to converge to a stable matching.
3) Stability: The stability of the proposed algorithm follows the definition of two-side exchange-stable (2ES) matching, as shown below:   In Algorithm 2, the condition for finalization is that no swap-blocking pair can be obtained in a complete cycle. Therefore, the final matching obtained from the proposed sub-channel assignment algorithm is always 2ES.

VII. SIMULATION RESULTS
In this section, the simulation results are presented to demonstrate the effectiveness of the proposed NOMA-MEC system. It is considered that the BS is located at the centre of a disc with radius r, and the users are randomly distributed within the disc. The distance between the eavesdropper and users is fixed as D 1,e and D 2,e . In the DDPG-based algorithm, each point at the x-axis includes 300 episode, each episode includes 400 steps, and the neural networks are randomly initialized at each point of the x-axis. In order to compare the performance of the proposed DDPG-based algorithm, the full local computing scheme is included as the benchmark, where all tasks are computed at users. The simulation parameters are shown in Table II.
The convergence of the proposed power allocation algorithm is examined in Fig. 2, which shows that the DDPG-based algorithm is able to converge to a stable structure within around 30 episodes. Particularly, during power allocation, the secrecy rate of both users is increased, and then MEC processing time g 1 and g 2 is increased. On the other hand, local computing time is decreased since more tasks are offloaded to the MEC server. As a result, the latency of the investigated system is reduced. It can thus be concluded that there is a trade-off between local computing time and MEC processing time. Moreover, it can be inferred that the optimal solution can be obtained when the values of all four terms are the same, which is around 0.11 s. Therefore, the proposed DDPG-based algorithm can achieve about 90% of the global optimal solution.
In Fig. 3, the impact of transmit power on various aspects is demonstrated. With the increasing of transmit power, the secrecy rate of users is increased, and then the task assignment coefficients β i is increased accordingly. In this case, the time consumption for both local computing and MEC processing is reduced, and hence, the latency of the considered system is significantly reduced compared to the full local computing scheme. Moreover, it can be observed that hybrid SIC decoding can outperform fixed SIC decoding in terms of latency. By comparing fixed SIC decoding, one can see that the latency with SIC Decoding Order 2 is slightly lower than that with SIC Decoding Order 1. This is due to the fact that the condition of SIC Decoding Order 2 is more readily achieved, as presented in Proposition 3 and Proposition 4. Furthermore, it is worth pointing out that the power allocation coefficients of user 2 and user 1 are respectively reduced in SIC Decoding Order 1 and SIC Decoding Order 2, which confirms the insights in Remark 2 and Remark 3.
The performance of the considered system in the multi-user scenario is shown in Fig. 4, where a scheme with random assignment is selected as the benchmark. It can be found that the matching-based sub-channel assignment algorithm can significantly reduce the latency. This is due to the fact that based on sub-channel assignment, each user can select not only a better channel to the BS, but also a worse channel to the eavesdropper. The matching-based sub-channel assignment is dynamically combined with the DDPG-based power allocation scheme, and a better performance can thus be achieved. Furthermore, it is worth pointing out that the hybrid SIC decoding can achieve the best performance compared to fixed SIC decoding in the multi-user scenario.  The probability of the offloading strategies in Section IV-C is shown in Fig. 5(a), where both hybrid and fixed SIC decoding schemes are included. With the increasing distance between user 1 and the eavesdropper, the probability for performing NOMA offloading strategy is increased, while the probability for adopting OMA offloading strategy and local computing strategy is decreased. Due to the fact that the channel condition |h 1,e | 2 is significant when the distance between user 1 and the eavesdropper is less than 1000 m, the probability for adopting NOMA offloading strategy in SIC Decoding Order 2 is greater than that in SIC Decoding Order 1. Moreover, as shown in Fig. 5(b), SIC Decoding Order 2 plays a dominant role when the distance is less than 1000 m. These results confirms the conditions for adopting different SIC decoding orders, i.e., Proposition 3 and Proposition 4. Therefore, it can be claimed that SIC Decoding Order 2 has advantages in practical systems. When the distance between user 1 and the eavesdropper is greater than 1250, there is no significant difference in strategy selection, and the probability of these two SIC decoding orders is comparable.
In Fig. 6, the scenario when distances between the eavesdropper and both users are simultaneously increased is studied. Due to the deterioration of the channel conditions between users and the eavesdropper, the secrecy rate is improved, and the size of offloaded tasks is increased accordingly. Hence, although the MEC processing time is slightly increased, the local computing time is significantly reduced. As a result, the latency of the considered system with both hybrid and fixed SIC decoding can be reduced. It is worth to highlight that the gap between SIC Decoding Order 1 and SIC Decoding Order 2 in terms of latency is decreasing with the increasing distance between users and the eavesdropper. This is due to the fact that compared to SIC Decoding Order 1, it is easier for SIC Decoding Order 2 to achieve the NOMA offloading strategy when the channel conditions are relatively poor. Moreover, it shows that the improvement caused by hybrid SIC decoding is more prominent when the channel conditions are good.
In Fig. 7, the computational capacity of both users simultaneously increases from 500 MHz to 2500 MHz. It is indicated that the latency of all schemes is reduced. In terms of the local computing scheme, the increase of computational capacity can directly reduce the computing time at users. In the considered NOMA-MEC system, with the increasing computational capacity, more tasks can be allocated for local computing, as shown by the decreasing task assignment coefficients. For this reason, the MEC processing time is reduced, although the secrecy rate still remains at the same level (between 2.4 Mbit/s and 2.7 Mbit/s). Since most of the tasks are completed locally with the increasing computational capacity, the gap between the proposed solution and the full local computing scheme becomes small. However, the hybrid SIC decoding still achieves the best performance, and SIC Decoding Order 2 can outperform SIC Decoding Order 1.

VIII. CONCLUSION
This paper investigated a secure NOMA-based MEC system with hybrid SIC decoding, where users tend to simultaneously offload confidential tasks to the MEC server in the presence of an eavesdropper. In order to minimize the latency of the considered system, power allocation, task assignment, and computational resource allocation were jointly studied. It was revealed that there is a trade-off between local computing time and MEC processing time, and hence, the closed-form solutions of computational resource allocation and task assignment were derived. By obtaining the conditions of different strategies, a strategy selection mechanism was established, and the corresponding power allocation problems were formulated. Based on the reinforcement learning, a DDPG-based power allocation algorithm was proposed, which can dynamically select the appropriate strategy and provide the near-optimal power allocation solution. Moreover, by comparing the conditions of different decoding orders, it was indicated that the NOMA offloading strategy is easier to be implemented if the decoding order at the BS is in the opposite order of that at the eavesdropper. The considered system was also investigated in a multi-user scenario, where a matching-based sub-channel assignment problem was developed in conjunction with the DDPG-based algorithm. The performance of the proposed scheme was demonstrated and validity of the provided insights verified by the simulation results. It is worth mentioning that designing the system for specific applications and simulating it with real-world datasets is a promising research direction.

APPENDIX A PROOF OF PROPOSITION 1
With the NOMA offloading strategy, both users offload tasks to the MEC server, i.e., β i > 0, ∀i ∈ {1, 2}. By substituting (21) into (13), the MEC computing time of any user i can be presented as The MEC processing time can be expressed as From (25), the following condition can be obtained: Hence, the optimal task assignment coefficients can be presented as By substituting (26), the optimal task assignment coefficient of user 1 can be presented as follows: Similarly, the expression of user 2's optimal task assignment coefficient is given by Note that constraint (16b) should be satisfied by the derived optimal task assignment coefficients. Take user 1's task assignment coefficient as an example, by including (16b), the following inequalities can be obtained: Due to the fact that the above condition always holds with any non-negative secrecy rate, constraint (16b) is satisfied by the derived task assignment coefficients.
In the OMA offloading strategy, one of the users computes all tasks locally, and hence, the task assignment coefficient and secrecy rate of this user are zero. For the OMA offloading user, denoted by user i, the optimal computational resource allocation coefficient can be obtained from (21) as τ * i = 1. From (25), the following equation can be obtained: The optimal task assignment coefficient of the OMA offloading user can be presented as It can be shown that the same solution can be obtained from (71) and (72) by setting the secrecy rate of the local computing user as zero. Hence, the obtained solutions in (71) and (72) can also be utilized for the OMA offloading strategy. Moreover, the obtained solution in (75) always satisfies constraint (16b) with any non-negative secrecy rate of the OMA user. The proof of this proposition is completed.

APPENDIX B PROOF OF PROPOSITION 2
This proposition can be proved by assuming the NOMA offloading strategy can be selected, that is, condition R i,s > 0, ∀i ∈ {1, 2} can be achieved. In this case, the secrecy data of both users should satisfy the condition in Remark 1. From (34), the following expression can be obtained: By substituting the above equation into (30), the local computing time of user 1 can be transformed as follows: The derivative of F 1 can be presented as follows: It is indicated that the monotonicity of user 1's local computing time F 1 is decided by term A 1 . Specifically, if A 1 > 0 holds, F 1 is monotonically decreasing with the increasing secrecy rate; otherwise, F 1 is monotonically increasing. Therefore, in the case that A 1 ≤ 0, in order to minimize the latency, the secrecy rate of user 1 tends to be zero. This condition can be transformed as follows: Similarly, based on (34) and (31), the local computing time of user 2 can be expressed as The derivative of the above function is The monotonicity of user 2's local computing time can be revealed. Particularly, if A 2 ≤ 0, user 2's secrecy rate tends to be the minimum, and hence, user 2 will compute all tasks locally. The following condition can be obtained: It shows that the OMA offloading strategy will be adopted if (79) or (82) is satisfied. Moreover, it is worth to mention that conditions (79) and (82) cannot be satisfied simultaneously, and therefore, this case will not lead to the local computing strategy. This proposition is proved.
This proposition is proved.