A robust MPC approach to the design of behavioural treatments

The objective of this paper is to provide some initial results on the application of control tools to the problem treatment design. Human behavior and reaction to treatment is complex and dependent on many unmeasurable external stimuli. Therefore, to the best of our knowledge, it cannot be described by simple models. Hence, one of the main messages in this paper is that, to design a treatment (controller) one cannot rely on exact models. More precisely, to be able to design effective treatments, we propose to use “simple” uncertain affine models whose response “covers” the most probable subject responses. So, we propose a simple model that contains two different types of uncertainties: one aimed at uncertainty of the dynamics and another aimed at approximating external perturbations that patients face in their daily life. With this model at hand, we design a robust model predictive controller, where one relies on the special structure of the uncertainty to develop efficient optimization algorithms.


I. INTRODUCTION
Although, the use of control algorithms in treatment context is not new there is still limited use of the concept of feedback in the design of treatments especially in the area of behavioral sciences. In this paper, we present a preliminary attempt to apply some well-known concepts from robust controller design to the development of efficient and robust intervention for behavioural treatments.
In the design of treatments in a behavioral context, one usually starts with data from several patients collected in a study. From this, one determines an approximate behavioral model to be used in the design of the control law. The first one is both variability in behavior and the way patients respond to treatment in study data leads to high inaccuracy of the models obtained. Moreover, in a practical setting a patient's behavior is always affected by external perturbations from events in daily life. All this leads to a large amount of uncertainty in the model and, hence, the need for feedback and, especially, robustness. Therefore, given the fact that one can increase the robustness of the overall treatment by using a control approach, the interest in using algorithms from control theory in a treatment context is growing. Moreover, the increasing use of portable devices enables the collection of information and delivery of the treatment in a timely manner and the possibility of automating treatments. To address some of the specific challenges one encounters in the development of treatments where all the treatments in our paper are "time-varying" treatments (in the sense that is used in behavioral sciences), in this paper we propose specific way to model uncertainty by considering independently uncertainty in patient behavior and sparse external perturbations. The first one is modelled as Gaussian white noise and the second one is modelled as a sparse bounded perturbations whose concrete description is given in the next section.
With this model in hand, we then propose to design controllers based on a model predictive controller (MPC) approach. It is shown that, assuming that the cost function is quadratic, then the resulting robust optimization problem can be reformulated as an optimization problem that can be efficiently solved.
Throughout this paper, we use the problem of designing treatment for smoking cessation as our running example. As a final comment in this introduction, one should note that the approach provided here is just a first step. There is still a lot of room for improvement and the objective of this paper is to introduce the problem to the control community and discuss a few of the challenges one is faced and the choices that one can make when meeting them.

A. Previous Work
Adaptive interventions (treatments based on feedback) are sequences of treatments that are adapted and re-adapted to individual circumstances and behaviors in order to achieve and maintain health behavior change. These are interventions that may be provided many times i.e., tens, hundreds, or even thousands of treatment occasions during the entire treatment period, for example via smart phone delivery, as intensive interventions. So, scientists in social or behavioral science have been working on finding effective ways to use portable devices such as smart phones to change health behavior or maintain healthy behavior. Consequently, adaptive intervention is a treatment sequence that adapts the individuals circumstances [1], [4] (in our jargon, feedback control) is being developed by social and behavioral scientists for different areas such as criminal justice [2], mental health [3], depression [5], hypertension [6], substance abuse [7] and Alzheimer's disease [8].
Further, engineering concepts such as dynamical modeling have been applied to the modeling and controlling of behavior [11]. Examples of studies presenting some special issues that arise in the application of such approaches to behavioral research appear in [9], [10]. These studies employ both time-invariant and time-varying models, depending on the specific problem. Moreover, some of the work in the literature also uses controller design methods to design adaptive intervention for special problems in behavioral science [12], [13]. Most of these use model predictive controller (MPC) approach.
In this paper, we also use MPC framework to address the control problem at hand. In the formulation of the control problem one takes into account all the different combinations of treatments available. Moreover, treatment constraints are included in the set-up of the control design problem. The resulting robust optimization problem is then solved by relying on results from robust optimization [16], [17] and by exploiting the specific structure of the uncertainty.

B. The Sequel
The paper is organized as follows: In section II we introduce the type of model that is going to be used to approximate patient behavior. Moreover, we briefly discuss a possible way to identify the model's parameters from patient study data. In Section III, we introduce an MPC formulation of the robust control problem to be solved and provide several results aimed at efficient numerical implementation. In Section IV we present a simulation of the application of the proposed approach to a simulated smoking cessation treatment is motivated by [21], [22]. Finally, in Section V we present some concluding remarks.

C. Notation
If A ∈ R m×n is a matrix then A ≥ 0 denotes positive semi definite matrix. If x ∈ R n is vector, then dim(x) is the dimension of vector x, x 0 is the number of the non zero element of vector x and x p denotes the p norm of x. x k will denote the k th element of vector x. Finally, y denotes the largest integer less than or equal to y

A. Model
As mentioned in the previous section, the data collected from human subject studies usually involves several patients with many gaps in the information collected from each patient. Moreover, behavior of the patient might change with time. Therefore, it is very unlikely that one can develop a complex model of behavior and, in our opinion, one should use a simple model that approximates patient behavior and, at the same time, highlights the fact that there is a substantial amount of uncertainty in it. Hence, the model used in the preliminary studies presented in this paper is of the form where x(k) ∈ R nx is the state, T (k) ∈ T is the control input (i.e., treatment) where T is a set of all available treatments. In many cases, this set has just a finite number of elements, w(k) ∈ R are unmeasurable sparse exogenous perturbations and encompasses uncertainty in the model. We now elaborate on the structure of the model above.
One should first note that the model above is an uncertain affine model. An affine model is used since it is often the case that the equilibrium point of the system to be controlled is not the origin. As for the uncertainty, it is divided in two terms to emphasize two very different types of uncertainty. Let K be the length of the window to be used in the MPC algorithm. We assume that is bounded in the 2-norm; i.e.
where (k : k + K − 1) denotes the vector containing all the noises from time k to time k+K −1. The 2-norm is chosen because the model is obtained by using data from several patients and differences in patients are typically modelled as Gaussian random variables. This allows us to define a set of high probability for these variables.
The external perturbation w(k) is motivated by a different kind of uncertainty in human behavior. The way patients behave is influenced by external "sparse" events that temporarily affect his/her response to treatment. Hence, one needs an uncertainty that is sparse, bounded in magnitude and with limited cumulative effect. A possible model for this, which is the one that is used in this paper, is the following: Again consider an "MPC control window" of length K. Then, using the same notation as above, w(k : where α is the bound on the magnitude of the perturbation, γ is the bound on cumulative effect and ς enforces the sparsity constraint. It should be noted that the set W is a union of convex polytopes and this fact is later explored when solving the robust optimization problem resulting from the MPC formulation of the control problem.

B. Identifying Model Parameters
As mentioned above, to identify the parameters of the model, one uses study data. However, before that and in collaboration with behavioral scientists, one should establish a model that is compatible with a priori knowledge on patients behavior. Note that, again, we do not believe that very "elaborate" models will be useful since one is approximating complex behavior with many unknown unmeasurable variables. Hence, a possible model for identification is a set of affine difference equations that can be easily converted into the model mentioned in the beginning of this section.
In the smoking cessation example that serves as the running example in this paper, one has three different continuous variables that one can estimate: smoking urge, denoted su, negative affect, na, which is a single scale indicating an adverse mood state, and self-efficacy, se, which represents an individual's belief in their ability to abstain from smoking. This type of model is commonly used in social science and it is called structural equation model (SEMs) in (3) [20]. Affine structural equation models are very popular in behavioral sciences and are often used to describe the relationship among the variables of interest. Hence we propose a model of the form where the signs of several of these coefficients are assumed to be known in advance. For example, as mentioned above, na should increase su and se should decrease it. Again, in this model we have that represents uncertainty in the model and w which represents the sparse unmeasurable external perturbations.
To estimate the coefficients from study data, we start by noting that we can take the model above and represent the relations in between the variables as Y ∈ R m is a vector, H ∈ R m×n is a matrix which organize the data in hand and is a vector containing all the noise vectors There are several ways to estimate the coefficient vector β.
In the examples provided in this paper we use the socalled Scaled Lasso algorithm [19]. This algorithm provides a way to determine an estimate of the parameters that balances the size of the noise and the sparsity of the exogenous perturbation w in the model.
Hence, following the above mentioned Scaled Lasso approach, we determine the coefficients of the model by solving the following optimization problem

A. Statement of Problem
The method for controller design used in this paper is the one based on model predictive control; e.g., see [18]. Here, we take the usual approach in robust MPC where one estimates the present value of the state variable and determines the value of the control variables over the horizon that minimize a given objective function. The first of these control signals is applied and the process is then repeated. The control algorithm is summarized in flowchart (1).
In this paper we consider objective functions of the form k+K η=k+1 (y(η) − θ) 2 although more complex convex quadratic functions can be addressed by the framework presented here. More precisely, subject to the system dynamics described by equation (1). The set T is the set of allowable control signals.

Back to Smoking Cessation Example:
In the case of the smoking cessation example, we have representing the fact that one can either apply or not apply treatment at each time and that the number of treatments applied in the last K + m instants is constrained to be less than or equal to T total . This constraint in the number of treatments is aimed at addressing the problem of treatment burden [23], which is the problem of decreasing effectiveness of treatment as the number of applied treatments increases.
Going back to the general problem, the optimization problem (7) can be rewritten as where M (w, T ) and v(w, T ) are linear functions of w and T whose coefficients depend on the model of the system and the objective function. The form of such functions is given in Appendix A.
We now present a problem equivalent to problem (9) that is suitable for implementation.
where W ext are the extremes of the polytopes whose union is W.
Idea of the proof: The theorem follows from the results in [16], [17] and the fact that all the members of a polytope of matrices are positive semidefinite if and only if all extremes of the polytope are positive semidefinite.
We now characterize the set of extremes W ext .
Theorem 2: Consider the set W defined in (2), where α, γ and ς define the bounds on the ∞ norm, 1 and sparsity respectively. Let n nz = dim{w}/ς, Then • If γ < n nz α < n nz γ then W ext is the set of all signed permutations of where n + = γ α and h = γ − n + α.
Proof: See Appendix B.
In next section we present a simulated application of the algorithm using a model that mimics data that might have been collected in smoking cessation studies such as [21], [22].

IV. SIMULATION RESULTS
In order to test the performance of the approach presented in this paper, we emulated in simulation the process that one would follow in a real application. We started by developing a model for smoking urge that loosely emulates the behavior seen in studies such as the ones in [21], [22]. To do this we used a model of the form (3) with order 2 since there is a balance between size of the perturbation and order of the model. The model coefficients obtained are and the description of the uncertainty is, for a MPC window of K = 8, the following: w has 25% of nonzero terms with an w 1 ≤ 5 and w ∞ ≤ 2.5 . As for the Gaussian uncertainty, we have su ∼ N (0, 1.3), na ∼ N (0, 0.78), se ∼ N (0, 0.55). The sampling period here is 8 hours and hence, data is collected three times per day The studies mentioned above do not contain treatment. Hence, we augmented the model with a nonlinear effect of treatment which is known as treatment burden in social science [23]. More precisely, the following terms were introduced and where sig(x) = 1 1 + e −(x−17) and This model for the effect of treatment was designed to emulate the case where the treatment has a positive impact but its effectiveness decreases when the treatment is applied too frequently.
For this highly uncertain model, we generated 500 different trajectories of the system, with random initial conditions and random realizations of the uncertainty. This was aimed at approximately simulating the behavior of 500 patients in a study where the treatment provided was random. Note that this is a "decent" surrogate of a study since the model was designed to approximate real data, except for the influence of treatment. The system identification algorithm described in Section II-B was then applied and a model to be used by the robust MPC algorithm was obtained.
We then applied the controller proposed in this paper to the nonlinear model mentioned above with, again, randomly generated uncertainty/noise. In other words, we simulated the application of our robust MPC algorithm to the simulation of a patient. The parameters used where the following • MPC window size: K = 8. In other words, one either applies or does not apply treatment at each time point, and there is a constraint of a maximum of approximately one treatment per 3 sample times, and full treatment is defined as T (k) = 1 for all k.
• Objective function to be minimized where the target value 2 was chosen so that one has a significant decreases in the mean smoking value without aiming at a value that is impossible to achieve with the limited actuation capabilities one has in this system • Bound on the 2-norm of the Gaussian noise: ρ = 30 The optimization problem to be solved here is a mixed integer convex problem. There are many ways to solve it, but in the simulations that are performed here, we relaxed the requirement T (k) ∈ {0, 1} to T (k) ∈ [0, 1] and the control applied: T applied (k) = round(T (k)) Simulation was run for 150 time instants corresponding to a real time interval of 50 days. We observed that our control law systematically improved smoking urge. We show here a typical example of the results obtained.   Table I shows the performance of adaptive intensive intervention design in terms of objective function, average values of smoking urge and total amount of treatment that the adaptive intervention applied. As one can see, there is a significant improvement in smoking urge not only in terms of average but also in the fact that we have a consistent decrease in craving. Figure 2 shows the smoking urge under both full treatment and the adaptive intervention. In red we depict the respond to the adaptive intensive intervention and in blue to the full treatment. Additionally, green stars indicate when treatment is provided from adaptive intensive intervention. In Figure 3, we plot the exogenous sparse disturbance that was applied to the patient. As expected, the algorithm "chooses" carefully when to apply treatment. It is mainly applied when external perturbations lead to a significant increasing trend in smoking urge.

V. CONCLUSIONS
Control engineering methods can be used to design adaptive behavioral interventions while reducing treatment burden of participants. In social science, there is no systematic way to design adaptive intervention for each individuals so this developed method works well for individually. In this work, treatments can be adapted and re-adapted in response to the individuals progress over a long period of time. These methods hold promise for maintaining desired behavior in situations where controlling the behavior is challenging due to complex dynamics. In terms of future work, effort is being put in developing more efficient numerical algorithms for solving the optimization problem (10). This will allow for the consideration of a larger receding horizon and, hence, better performance.

A. Computing Closed Form of MPC Objective Function
General solution of discrete time affine system at time k is : Define following matrix: . . . 12) y and N ∈ R K then the following matrix M i : where i = 1 . . . 6 and M ∈ R K×K is constructed. So su: Then, we can define v(T, w) and matrix M (T, w) from equation (14)  The first two cases are immediate since they correspond to the cases where either the 1 or the ∞ are the only binding constraints. Hence, in this proof, we concentrate on the third case.
Note that, given the symmetry of the problem, we can concentrate on the subset of the elements of W that are positive and satisfy w(i + 1) ≤ w(i) for all i.
All other extremes will be obtained from this one by permutations and sign changes of the entries. Recall that in the case we are considering γ < n nz α < n nz γ. and the elements of the vector w are in nonincreasing order. Therefore, the extreme of the set is given by "pushing" as many of the first few elements to their maximum value. Given that w ∞ ≤ α and w 1 y ≤ γ, one can only have the first n + = γ α elements of the vector w equal to α. To reach the extreme, and recalling that the entries of the vector w are in nondecreasing order, one has to have the (n + +1)-th element of the vector at its maximum value. Given the fact that w 1 ≤ γ and the first N + terms are equal to α w(n + + 1) ≤ γ − n + α.
Hence, for vectors whose entries are in decreasing order, the extreme of the set is attained by a vector of the form [ α · · · α n+ h 0 · · · 0 K−n+−1 ] and the proof is complete.