Dangerous tangents: an application of 0 -convergence to the control of dynamical systems

Inspired by the classical riot model proposed by Granovetter in 1978, we consider a parametric stochastic dynamical system that describes the collective behavior of a large population of interacting agents. By controlling a parameter, a policy maker seeks to minimize her own disutility, which in turn depends on the steady state of the system. We show that this economically sensible optimization is ill-posed and illustrate a novel way to tackle this practical and formal issue. Our approach is based on the (cid:2) -convergence of a sequence of mean-regularized instances of the original problem. The corresponding minimum points converge toward a unique value that intuitively is the solution of the original ill-posed problem. Notably, to the best of our knowledge, this is one of the ﬁrst applications of (cid:2) -convergence in economics.


Introduction
Understanding and controlling collective behavior is a challenging, subtle, and potentially very useful endeavor [see Granovetter (1978)]. The mere attempt to know agents' preferences, motives, and norms is fraught with practical and conceptual difficulties: Data are sometimes scarce or difficult to obtain, and people may have blurred incentives and may -whether consciously or not -hide or misrepresent the drivers of their behavior. Moreover, even when a reasonable model for individual actions is agreed, the results can be puzzling. One example of this is shown in Thomas Schelling's segregation model, where slight homophilic preferences for neighbors of the same race lead in equilibrium to massive residential segregation, Schelling (1971). These collective models are known to produce very diverse and intriguing outcomes (simple vs. complex attractors, bifurcations, stable polarized vs. non-polarized equilibria,...). For this reason, a policy maker (or any external mastermind) could be interested in controlling the behavior of the population by setting some parameters at a certain optimal level.
We consider a problem that is characterized by two components: a dynamical system describing the evolution of some state variable that represents, in aggregate terms, agents' collective behavior; and a social planner who ex-ante sets the parameters of such dynamics to minimize her disutility, which in turn depends on the steady state of the system. Interestingly, in some cases, the problem turns out to be ill-defined in that the objective function is bounded from below, but does not admit a minimum. However, we show that by defining a proper sequence of auxiliary stochastic problems, it is possible to formally prove a convergence result whose limit is well-defined and helps shed light on the original problem.
To fix ideas and keep the formalism to a minimum for the moment, let us consider the dynamical system where the future state r (t + 1) depends on the current state r (t) due to the (iterated) application of F, as well as on a parameter σ . Let us now assume that for a given initial condition r (0) = r 0 , a limit state ρ(σ ) is reached. 1 The social planner attempts to solve an optimization problem as follows: (1) The fact that the social planner's payoff depends on the long-run outcome of the system (i.e., on ρ(σ )) seems reasonable for many applications. Let us consider, for example, the case where r (t) represents the market share of a durable good and σ is a parameter related to the quality of that good. In this respect, ρ(σ ) can be interpreted as the equilibrium market share related to a specific quality strategy implemented by the company. In the payoff, the component, c(σ ), plays the role of the (potential) cost to implement this strategy and, mathematically speaking, completes the definition of a specific parametric dynamical system indexed by σ .
Examples of modeling approaches of individual decisions giving rise, at the aggregate level, to dynamics resembling (1) can be found in Blume and Durlauf (2003) and Barucci and Tolotti (2012a, b). Models of supply and demand emerging in financial markets, where a similar structure manifests itself, are presented in Gordon et al. (2013). The diffusion of innovations and adoption models of durable goods as pioneered in Bass (1969) and, more recently, in Peres et al. (2010) share a microfoundation that is similar in spirit to the one presented here. In addition, epidemiological models of SI (Susceptible Infected) type exhibit dynamics that resemble the formalism proposed in (1). Finally, as already mentioned, a celebrated model in this context goes back to Granovetter (1978); in that study, a mastermind manipulates the mood of crowds to trigger riots that eventually involve a large proportion of the population. For the sake of clarity and simplicity, the latter riot model will hereafter be used to motivate our treatment and exemplify our results.
The previous situations all share a similar conceptual structure: contingent on the value of the parameter, some equilibrium is likely to endogenously appear as the final outcome of the dynamics, and an external agent is interested in shaping or controlling this outcome. While the intuition appears to be quite natural, the explicit setup of this "optimization problem over a dynamical system" is not very frequent, and we explore some of the intrinsic difficulties that may arise when defining such problems.
In fact, we show that problems formalized as in (1) can be ill-posed in that the objective function is discontinuous, it has no minimum, and it only admits an infimum. Essentially, this is due to a saddle-node bifurcation characterizing the dynamical system: One of the fixed points of the equation r = F(r ; σ ) disappears when σ increases beyond some critical value σ c . 2 Note that ρ(σ ) is one such fixed point (the smallest one in our setup, in fact) and that the presence of a bifurcation makes its value jump abruptly at the bifurcation value. Moreover, as already noted in Granovetter (1978), the bifurcation value σ c is intuitively the value that the mastermind is looking for. Roughly speaking, the system is purposely steered to reach a bifurcation point (because this is advantageous).
However, as a matter of fact, this value is not a minimum point for f , but rather a minimizer. Note that from now on we refer to the "minimum point" if a minimum is attained, and "minimizer" if it is not attained. As a consequence, this optimization problem is intrinsically ill-posed. Summarizing, in contrast to their natural appeal, problems such as (1) can be fraught with technical difficulties that can impede formal and numerical treatments.
Technically speaking, one of the main goals of this paper is to overcome such a technical issue by examining the problem from a different angle. Instead of looking at the steady states of the one-dimensional (deterministic) dynamical system as expressed in (1), we introduce a regularized stochastic version of it, where the number of agents in the population is large but finite. Despite being stochastic, this problem is now well-defined and admits a unique minimum point σ * N for all N , where N is the size of the population. Finally, we are able to provide a formal limit for N going to infinity and show that the sequence of minimum points, (σ * N ) N ≥1 , converges exactly to σ c , the value that is "expected" to be a solution of the ill-posed original problem. Interestingly, it turns out that the aforementioned convergence holds in a -sense: We will formally prove that the sequence of objective functions f N of the auxiliary N -dimensional problems -converges to a well-defined f ∞ (see (13)). This -limit turns out to be equal to the original objective function f almost everywhere, except at the optimal level σ c . 3 This finding sheds light on one rigorous way to deal with the ill-posedness of the original problem suggested in Granovetter's work. More recently, singularities similar to the one described above are also detected in Nadal et al. (2005) and Gordon et al. (2013). In these works, the authors model a monopolist in charge to set the optimal price to foster demand and maximize profits. With language and notation borrowed often from physics, these papers contain a model that has similarities with ours. They also contain discussions of the "epistemic uncertainty" inherent in selecting a price to maximize the monopolist's profits. Uncertainty due to the presence/disappearance of multiple equilibria is acknowledged by sentences such as "[the optimal solution] lies very near the critical price value at which such high demand no more exists" (Gordon et al. (2013)). The singularity recognized (albeit not "solved") in their model may be formally tackled through the use of a -convergent sequence of problems whose limit turns out to be identical to the original singular model almost everywhere.
In terms of methodology, to the best of our knowledge, papers in the field of mathematical economics and social sciences that employ -convergence are quite rare. Ghisi and Gobbino (2005) describes a variational problem arising from a generalization of the well-known monopolist's problem introduced in Rochet and Choné (1998). In this model, the monopolist proposes a set of products and looks for the optimal price list, which minimizes costs and therefore maximizes profit. This leads to a minimum problem for functionals H (the "pessimistic cost expectation") and G (the "optimistic cost expectation"), which are in turn defined through two nested variational problems. The authors prove that the minimum of G exists using an approximating sequence that -converges to G and where such a minimum coincides with the infimum of H . An economic model of monopoly has also been studied in a general setting by Monteiro and Page (1998) and under convexity assumptions by Carlier (2001). In the latter, the author studies a principal-agent model with adverse selection and characterizes incentive-compatible contracts in terms of an envelope property called h-convexity. Using this characterization, the principal's problem is written as a non-standard variational problem (with h-convexity constraint) for which the existence of a solution is proved. Monteiro and Page (1998) provides similar general existence results for the principal-agent problem with adverse selection. However, their approach to the problem differs to the one in Carlier (2001). After all, the authors consider budget constraints that force prices to remain in a given compact set, and their results rely on a nonessentiality assumption (i.e., nonessentiality of some goods relative to others), which is not required in Carlier (2001).
In this paper, we show how -convergence can be used to deal with a natural but ill-posed problem faced by the decision maker. Indeed, the economic interest of our approach stems from the observation that it is the decision maker herself who rationally pushes the system to configurations where some equilibria disappear. This fact also poses severe difficulties when seeking numerical solutions, especially if higher-dimensional systems are under investigation.
This paper is organized as follows. Section 2 describes in detail the optimization problem as stated in (1). In Sect. 3, we provide a stochastic version of the same problem, where the number of actors in the economy is now finite. This stochastic approach, while seeming more cumbersome on the surface, provides a double benefit: Besides allowing us to take advantage of probabilistic tools, it naturally leads the modeler to simulation and numerical methods. We will see that this finite-population approach is crucial to analytically setting the proper convergence scheme. Section 4 is devoted to the analysis of numerical simulations, and conclusions are drawn in Sect. 5. Appendix A contains the technical proofs, and Appendices B and C summarize the main concepts of -convergence and continuity employed in the paper, respectively.

The deterministic dynamic model
Inspired by Granovetter (1978), we first consider a (infinite) population of actors facing the decision of whether to take part in a riot. The agents are heterogeneous, they are aware of the current proportion of people involved in the riot, and they decide whether or not to join in based on a personal (random) activation threshold. If the proportion of active agents is above that threshold, they will join the crowd; otherwise they will not. Technically speaking, the random thresholds are independent copies of a random variable with distribution F, which, from now on, is assumed to be Gaussian with fixed mean μ = 0.25 and standard deviation σ ∈ [σ min , σ max ] with σ min > 0 and σ max large enough but finite. 4 Recall that σ will be set by the social planner. In this respect, by choosing σ , she is fixing a specific Gaussian distribution F and, therefore, a specific behavioral trait of the underlying population of potential rioters. Practically, this can be interpreted as a deliberate modification of the volatility of the crowd to ignite a riot. We define r (t) as the proportion of agents taking part in the riot at time t, where t ∈ N and fix r (0) = 0. 5 People with a threshold below r (t) decide to join the riot at time t + 1; moreover, once they have joined the riot, they cannot withdraw from it. Therefore, r (t + 1) accounts for the people who were already involved in the riot at the previous time plus the proportion of newcomers. Put in other terms, this is exactly F(r (t); σ ), i.e., the probability that the threshold is below r (t). Hence, we obtain the recursive equation of Granovetter (1978): We denote by ρ(σ ) the smallest solution 6 of the fixed point equation In the case that F admits a unimodal density function (as in our example based on Gaussian thresholds), it is possible to characterize such an equilibrium. In fact, it is well known that under this minimal assumption, there are at most three solutions to (3), and the number of such solutions depends on the value of σ . Moreover, when there are three equilibria, the intermediate one is always unstable, whereas the two extreme ones are (locally) stable. As already stated, we are interested in the case where the initial condition is zero and, therefore, the dynamical system always converges to the smallest solution of (3). This family of dynamical systems exhibits what is called a saddle-node bifurcation. 7 Keeping μ = 0.25, the black line in Fig. 1 shows three fixed points of r = F(r ; σ ), occurring when σ = 0.1. The intermediate fixed point, located at about 0.15, is unstable and, hence, the limit state for any initial condition below this fixed point would converge to the lower fixed point. For all such initial conditions, the limiting fraction of the population involved in the riot would be about 0. Suppose now that the social planner has the power to increase σ (at a cost): This has the effect of sliding the graph of F upward, up to the point where the two smaller fixed points merge when F is tangent to the bisector line. (The case is depicted by the dashed line in Fig. 1, obtained when σ = 0.122 ≈ σ c , where σ c denotes the exact tangency value.) Now, the limit state when the initial proportion of citizens involved in the riot is an infinitesimal fraction would increase to about 6.2%. But even more importantly, a further tilt to F beyond σ c triggers the occurrence of a saddle-node bifurcation, and the limit state abruptly jumps to 1; see the dot-dashed curve in the picture, relative to σ = 0.15.
As strongly pointed out in Granovetter (1978), finite populations drawn from the same F, when σ ≈ σ c , may lead to very different limit states under iteration: "There is no obvious sociological way to explain why a slight perturbation of the normal distribution around the critical standard deviation should have a wholly discontinuous, striking qualitative effect. This example shows again how two crowds whose average preferences are nearly identical could generate entirely different results." 8 The revelation that minor perturbations in individual features could produce large aggregate effects was probably a part of the zeitgeist of the seventies, as similar ideas are present in Schelling (1971) concerning racial segregation. Likewise, Allen and Sanglier, examining dynamic models of urban growth in 1979, distilled their results, observing that "small perturbations of density, perhaps of random origin, are amplified by the interactions between the elements of the system and lead to a qualitative change in the macroscopic structure of the spatial distribution." 9 We consider these insights to be of continued relevance, despite the passage of four decades.
To make this argument formal, we deviate from the original problem and introduce an external agent acting as a social planner. Suppose that a rabble-rouser is keen on minimizing the cost needed to fuel a massive participation in the uprising. Of course, due to the cost of stirring up people, the policy maker faces the following decision problem: "How can I optimally settle the mood of the population in order to obtain a large-scale riot?". Considering a unitary cost k > 0 for a unit of standard deviation, her optimization problem can be written as Note that a linear cost kσ has been chosen for tractability reasons, and that if k is too large, the interest in the problem will be lost since the objective function reaches a proper minimum at some point σ ≤ σ c , possibly at σ = σ min . A similar argument applies also to the expected value μ related to the distribution function F. 10 Figure 2 depicts the effect of the saddle-node bifurcation on the target function f (σ ) = kσ − ρ(σ ), for k = 0.5. Clearly, as σ increases beyond σ c , the disappearance 458 R. Maggistro et al.

Fig. 2
The graph of kσ − ρ(σ ) as a function of σ , for k = 0.5. The discontinuity occurs at σ c ≈ 0.122 of two fixed points generates a jump in the limit state that will be reached through iteration of F. As pointed out in Strogatz (2015), the term "saddle-node" is not entirely consolidated in the dynamical systems literature; for instance, such a bifurcation is quite imaginatively called a "blue sky bifurcation" in some cases to stress that, reversing the direction of the change of σ , a new equilibrium can be created "out of the blue." Although no closed-form solution for this problem exists, the objective function satisfies the following lemma. 11 Lemma 2.1 Define σ c as the value of σ that makes the graph of the map x → F(x; σ ) tangent to the bisector line. Define alsõ Suppose finally that k < k th :=r c −ρ(σ c ) σ c and μ < 1 2 . Then the function where ρ(σ ) is the smallest solution to (3), has the following properties: In other words, f has no minimum and the inf is reached, for μ = 0.25, at σ c ≈ 0.122. On the other hand, σ c is exactly the value at which the curve F(x; σ c ) is tangent to the bisector line (see Fig. 1). Algebraically, σ c is the unique value of the parameter such that the following system is solvable for some r ∈ [0, 1]: As a matter of fact, the problem is clearly ill-defined: No solution exists, even though the "optimal point" has a very clear geometrical interpretation in terms of the distribution function of the random thresholds.
To deal with this issue, we propose an auxiliary optimization problem in the next section. In particular, we let the population of actors be finite, of size N . We will see that this makes the problem stochastic but, on the other hand, well-posed for any finite N . Finally, we will show that the sequence of objective functions of such finite population problems converges in a -sense to a function f ∞ , which, almost everywhere will be equal to the original objective function f . However, in contrast to f , f ∞ admits a unique minimum; therefore, the new auxiliary problem will also be well-defined for N → ∞.

The (stochastic) problem with N agents
Let us consider N actors faced with the following decision problem: "to participate or not in a riot." The state variable is thus binary. We define y i (t) ∈ {0, 1} for i = 1, . . . , N , where y i (t) = 1 means that agent i is participating in the riot at time t. Once part of the riot, the rioter cannot withdraw. Moreover, we assume that y i (0) = 0 for all i. This choice is intended to represent the fact that in the original infinite-population problem, we analyze the social phenomenon from its outset. It is assumed that no agent is initially rioting, setting r (0) = 0. Being a decision under collective behavior, the reward for participating in the riot depends on the present number (or better, the proportion) of people involved. This quantity is given by 12 Any actor decides to join the riot as soon as the quantity r N is large enough. We model random thresholds X i , i = 1, . . . , N as N independent copies of a random variable X with absolutely continuous distribution function F. The rule is straightforward: 12 We could also consider the quantity r When N becomes large (infinite), the contribution of y i is negligible, thus the two problems have exactly the same limiting behavior. Summing up and normalizing over N , The right-hand side of Eq. (7) is nothing but F N (r N (t); σ ), the N -dimensional empirical distribution of the random thresholds. We thus obtain the following recursive equation characterizing the N -dimensional system We now have a population of N agents that evolves according to (8). In addition, the social planner aims to control the behavior of the population by setting the optimal σ so as to minimize a measurable function of the stochastic process r N . More specifically, the policy maker considers the steady states of such dynamics as a natural outcome to consider.
Should the social planner look at the finite (stochastic) population or rather implement some theoretical asymptotic results to rely on some simpler deterministic limit dynamics? Technically speaking, this translates in the order in which we calculate the limits of N → ∞ and t → ∞. To deal with a deterministic system, we first take the N -limit; the computation of steady states is then straightforward, but, as just seen above, the optimization problem is ill-posed. Conversely, if we first perform the time limit, the observable steady states are stochastic, and the modeler relies on statistical tools to remove the randomness inherent in the finite-population system. In the remainder of the section, we formalize the two approaches, highlighting how different they are in their outcomes.

Weak convergence of stochastic processes
The first approach relies on the classical asymptotic theory developed in Ethier and Kurtz (2009). The convergence of the stochastic process r N to a well-posed limiting process r , when the number of actors N tends to infinity, is ensured by the following result. 13 Proposition 3.1 Assume the recursion given in (8), with r N (0) = 0 for all N . When N → ∞, the process (r N (t)) t≥0 weakly converges toward (r (t)) t≥0 , characterized by the following asymptotic recursion where F is the distribution of X i , i = 1, . . . , N as defined in (6).
Since, by assumption, r (0) = 0, the dynamical system necessarily converges to ρ(σ ), i.e., the smallest among the (possibly multiple) solutions of obtained considering the limit for t → ∞ of (9). In particular, as discussed above, there is no solution to the social planner minimization problem.
Example 3.2 (The Granovetter setting) Assume that X ∼ N (μ, σ ), where μ is fixed at the level 0.25. Regarding the standard deviation, there exists a critical level σ c such that there is only one solution of (3) for σ > σ c . For σ < σ c , the solutions are three, and for σ = σ c , the function r → F(r ; σ ) is tangent to the bisector and the solutions are exactly two.

0-convergence of regularized operators
The second approach is based on the stochastic version of the dynamical system given in (8). In contrast to what was done previously, we first take the time limit of the dynamics. As a matter of fact, the steady state is now a random variable, measurable w.r.t. the N -dimensional sample X 1 , . . . , X N . Let the random variable R N represent the (random) steady state of the dynamics in (8). Formally, To properly define an optimization problem for any finite N , we rely on the expected value of R N as the observable to be investigated by the social planner: It turns out that the auxiliary problem based on such a new observable is now wellposed. Furthermore, it is possible to characterize the limit of the sequence (ρ N (σ )) N in terms of ρ(σ ), the steady state related to the original and ill-posed optimization problem. These claims are stated in the following Theorem 3.3 Fix N and consider ρ N as defined in (12). Then the optimization problem is well-posed and admits a minimum point σ * N . Moreover, for N → ∞, The limit of f N is therefore taken in a -sense and f ∞ is exactly the -limit of the sequence of real functions ( f N ) N . Instead, it can be proven that the convergence does not work properly in a classical almost sure or uniform sense. Note, finally, that f ∞ is nothing but the lower-semicontinuous envelope of f (the original objective function of problem (P1)); the two functions only differ to each other for the value at σ c , which turns out to be exactly the minimum point for the problem where f ∞ is now used as an (auxiliary) objective function.
A summary of the different convergence schemes is graphically visualized in Fig. 3. The starting point is the top-left stochastic dynamical system expressed by (8). In the first approach, described in Sect. 3.1, we first move to the right by taking an N -limit, in the sense of the "convergence of stochastic processes"; then we move down (time limit), obtaining ρ(σ ). Moving in the area of the social planner, we come across the ill-posed problem (P1) as depicted in the bottom part of the right column. Conversely, by starting at the top-left corner and implementing the second approach described in Sect. 3.2, we first move down along the left column (time limit), and later introducing the expectation operator, we arrive at the bottom-left point where σ * N is defined as the minimum point of f N for any fixed N . Finally, as f N converges in a -sense to f ∞ , we have that σ * N converges to σ c , the minimum point of f ∞ . In the next section, we collect some numerical findings and give a sense of the previous asymptotic result by solving the problem for increasing values of N . Moreover, we show that the numerical results are notably affected by the accuracy in computing an estimator for the expectation of the random variable R N .

Numerical findings
In this section, we numerically analyze the riot problem where, for the sake of concreteness, we set k = 0.5. We simulate M stochastic instances of the optimization problem (P2) in Theorem 3.3 to obtain M couples minimum point-minimum ( mσ * N ; mf * N ), m = 1, . . . , M, for a given N . Then we increase N to visually and numerically show the -convergence given in Theorem 3.3.
The key insight of the previous section was that convergence can be achieved by taking the expectation, ρ N (σ ), of the random steady state R N defined in (11). Since a closed-form computation of ρ N (σ ) appears to be impossible, we approximate it with a sample mean of S independent generations of steady states. The numerical results are obtained as follows: Since the sample mean converges to the correct average for S → ∞, the higher the number S of simulated steady states, the closer our approximation will be to ρ N (σ ) = E σ [R N ]. Hence, it is important to realize that the numerical method to solve the social planner's optimization problem relies on an additional parameter, S, due to the need to replace the mathematical expectation with a sample mean. This is equivalent to saying that whenever an evaluation of the objective function in (P2) is needed, a set of N thresholds will be independently resampled (over and over) for S times, as seen in the above pseudocode to compute f N (σ ). Figure 4 shows some representative graphs of the objective functions of the optimization problem (P2). The top panel shows, for fixed S = 100, the functionsf N (σ ) relative to N = 100, 1000, and 10,000. It can be seen that larger values of N lead to smoother behavior and, more importantly, produce more accurate approximations of the function f ∞ defined in Theorem 3.3.
The bottom panel illustrates the role of S, for given N = 1000: when S = 10, admittedly a case in which the true expectation is poorly estimated, the function f N (σ ) is noisy and shows jumps that would make any minimization effort difficult or impossible. In contrast, when S = 100 or S = 1000, the graphs are smoother and almost overlapped; this seems to suggest that such numbers of sample observations are Fig. 4 The objective function f N (σ ). Top panel: for fixed S = 100, the cases N = 100, 1000, 10,000 are shown. Bottom panel: for fixed N = 1000, the cases S = 10, 100, 1000 are displayed appropriate to reasonably approximate σ * N . Practically speaking, some balance must be struck and, if N is to be increased to explore asymptotics, S should be limited to avoid an excessive computational burden.
Based on the above argument, we set S = 100 and solve M = 100 independent instances of the optimization problems, for increasing values of N = 100, 500, 1000, 5000, 10,000, 50,000, 100,000. Figure 5 shows the boxplots of m σ * N , m = 1, . . . , M, in various cases. The boxes depict interquartile ranges (IQR); the whiskers extend up to 1.5 IQR to provide evidence of outliers (circles), if any; the horizontal line is the median value. While for small values of N , the true minimum point σ c ≈ 0.122, referred to in Theorem 3.3, is overestimated and there are some outlying results, as N increases, most of the minimum points of the (stochastic) problems are quite close to the correct result and, clearly, boxplots support the convergence rigorously proven in the previous section. In other words, if N is too small, the min- imum points obtained by numerical optimization inaccurately span a wide interval, which can be roughly located at [0.17, 0.21]. As N increases, the numerical results are increasingly close to σ c and are no longer dispersed. The plotted data were obtained using the routine optimize in R, R Core Team (2018); the computations took about 18 min on a 51-core 2.2 GHz Linux cluster setup with the library parallel. It may be worth noting that, say, the boxplot for N = 100,000 required to sample an order of M · S · N = 100 · 100 · 100,000 = 10 9 individual thresholds. 16 Not surprisingly, the value of S also has a conspicuous impact on the accuracy of the M solutions of the problems found numerically in our simulations. Figure 6 depicts, for fixed N = 1000, the boxplots of the minima mf * N , m = 1, . . . , M, for 16 As optimize typically evaluates the objective function about 20 times, the sampled thresholds exceed 10 10 . different values of S: for S = 1 the computed results are essentially unreliable and spread over a large interval. Note that the boxplot relative to S = 1 represents well the ill-posedness depicted in Fig. 3: Unless the mean operator is used, moving down along the right side of the diagram by increasing N does not remove the discontinuity of f (σ ). As a consequence, numerically computed minima are most often very far from kσ c −r c . In contrast, by increasing S, i.e., by taking the mean on the left side of the diagram, and then minimizing the objective function f N such that f N → f ∞ , produces a sequence of pairs of real numbers (σ * N , kσ * N − ρ N (σ * N )) that converges to (σ c , kσ c −r c ).
Improving the accuracy of the sample mean by using a larger S reduces the likelihood of error and visually reconfirms that S in the range 100-1000 appears to be numerically satisfactory. Going into greater depth, the figure again portrays the importance of the smoothing effect provided by the average operator: problems can be solved more accurately, the ill-posedness of the original formulation is removed, and tractability is obtained in the form of -convergence.

Conclusions
We have studied a class of "optimization problems over a dynamical system" and shown that care is needed to deal with ill-posedness. This was motivated by the classical riot example discussed in Granovetter (1978): intuitively, at some critical value for one parameter, the resulting endogenous equilibrium can abruptly jump due to small and natural sampling variability in the activation thresholds of agents. More formally, in the presence of a saddle-node bifurcation of the large-scale deterministic dynamics, some equilibria disappear, and the objective of the policy maker turns out to be discontinuous, breaking down conventional optimization approaches. The problem cannot be eliminated by merely increasing, or taking the limit of, the number N of agents and solving the deterministic version of the problem due to the tangency condition.
For this reason, we considered a sequence of finite-population problems, where the steady states of the dynamical system are random variables, and we proved that the objective functions of such problems are now continuous thanks to the regularizing effect of the expectation operator. Moreover, the sequence of the minimum points converges to the critical value of the parameter, i.e., the value where the bifurcation happens, and the sequence of minima converges to the infimum of the (ill-posed) original problem. Technically speaking, this convergence applies in a -sense, and we believe that our work may be one of the very few practical applications of this methodology to economically relevant problems. Specifically, the sequence of the objective functions f N -converges toward f ∞ , which is proven to be the lowersemicontinuous envelope of the original objective function f . Interestingly, f ∞ and f only differ at the value σ c , which is the limit of the sequence of minimum points, and in turn the point where the social planner (and we, as rational and conscious external observers) expected a minimum to exist in the ill-posed original problem. In this respect, our approach endows the social planner with a mathematically precise argument to identify the unique optimal policy. From a theoretical point of view, the use of the expectation operator appears to be crucial in the definition of a sequence of problems that converges to a regularized rightcontinuous objective function, where the minimum is now well-defined and coincides with the infimum of the original problem.
Our numerical results also make clear that the use of an expectation is not simply an astute device to obtain some usable approximated result but, more fundamentally, poses the basis for the -convergence (in the number of agents N ) to the theoretical model. This, together with a proper Law of Large Numbers in S, the sample size of the estimator of the expectation, ensures also convergence of the numerical simulations.
We have demonstrated, in this specific case, that sample sizes of the order of one hundred to one thousand provide enough smoothing to obtain reasonably accurate numerical results (with the help of a multi-core processor and parallel computations).
We believe that our exemplification shows the need to incorporate proper sampling schemes to similar problems or whenever, say, a numerical treatment is the only viable option due to a lack of closed-form expressions for the steady state of the dynamical systems.

A Proofs
All proofs are given in this appendix.

Proof of Lemma 2.1
We first state and prove a technical result on bifurcation theory of a dynamical system. Lemma A.1 (Saddle-node bifurcation) Consider the dynamical system where x ∈ [0, 1] and F(· ; σ ) is a continuous distribution function with standard deviation σ , admitting a unimodal density function. The set of steady states of (14) is given by the solutions to There exists a threshold level σ c , such that: In the opposite case, if x 0 > x m , then Proof Note that F(0; σ ) > 0 and F(1; σ ) < 1 for any σ . Therefore, at least one solution x to (15) exists. Since F is S-shaped, it is convex for small x and concave for large x. A maximum of three solutions to (15) can therefore appear. The number of solutions to (15) depends on σ . This is an example of a saddle-node bifurcation. 17 By definition, σ c identifies the unique situation in which F is tangent to the bisector line at some point. In this case, exactly two different solutions to (15) exist. As soon as we take a value σ > σ c , such a tangency point disappears. In the opposite case, for σ < σ c , there are three intersections. We check the stability of the steady states by looking at the linearized version of the system (14). In this way, it is not difficult to see that if the equilibriumx is unique, then F (x; σ ) < 1, so it is linearly stable. In case of three equilibria, F (x m ; σ ) > 1, whereas F (x l ; σ ) < 1 and F (x h ; σ ) < 1.
Returning to the proof of Lemma (2.1), we are exactly in this situation, since F is Gaussian. The graph of the distribution function F intersects the bisector either three times, twice, or once. A visual representation of the three different cases is reported in Fig. 1. When σ = σ c , we are in the tangency situation: the graph of F intersects the bisector line at one point x l , when the curves are tangent, and at one second point x h , when they are not. If the expected value μ of the distribution is such that μ < 1/2, then it is easy to see that x l < 1/2 < x h . Consider finally that we take r 0 = 0 so that the dynamical system always converges to x l , that is, the smallest among the possible solutions to (15). Let us call this equilibrium ρ(σ ).
The continuity of f (σ ) on [σ min , σ c ] and on (σ c , σ max ] immediately follows from the continuity of the map σ → ρ(σ ) on the same intervals. However, the proof of this latter property is not trivial at all, and is postponed to dedicated Appendix C.
Summarizing, we have proved that: (i) f is left-continuous with a discontinuity at σ c ; (ii) the function f is bounded from below and admits a finite infimumf , which is not a minimum.

Proof of Theorem 3.3
We first state and prove five technical lemmas related to R N as defined in (11) and to ρ N (σ ) as defined in (12).

Lemma A.2 For each N , R N is a measurable and bounded function of the finite samplẽ
whereF(x; σ ) = N i=1 F(x i ; σ ) and wherex = (x 1 , . . . , x N ) is a realization of the sampleX . Note that the integrand function R N (x) is a measurable and bounded function of the sample. As a consequence, the integral is well-defined; moreover, it is continuous in σ as soon as F is continuous in σ .

Lemma A.3
For any σ = σ c and ε > 0, Proof Fix σ = σ c . We show separately that, for N large enough and with probability one, R N < ρ(σ) + ε and R N > ρ(σ) − ε for any ε > 0. We start with the former inequality. 18 To this end, we consider an alternative and equivalent definition for R N : We show that there exists ε 0 > 0 such that for all ε > ε 0 , there exists N such that F N (x; σ ) < x for x = ρ(σ ) + ε. 19 This latter inequality states exactly that R N < ρ(σ) + ε. By way of contradiction, suppose that there exists ε > 0 such that for all N , where the latter inequality comes from the fact that F(ρ(σ ); σ ) = ρ(σ ). Now this is a contradiction, since F is continuous in its first argument and sup x |F N (x; σ ) − F(x; σ )| → 0 by virtue of the classical Glivenko-Cantelli Theorem.
To prove that R N > ρ(σ) − ε, we use a similar argument. We show that there exist ε 0 > 0 such that for all ε > ε 0 , there exists N such that F N (x; σ ) > x for x = ρ(σ ) − ε. By way of contradiction, suppose that there exists ε > 0 such that for all N , F N (x; σ ) ≤ x; then Finally, note that for all x ≤ ρ(σ ) − ε, there existsN such that FN (x; σ ) > x for sure. Suppose there existsx < ρ(σ) − ε such that F N (x; σ ) ≤x for all N . Since F(x; σ ) >x, again we find a contradiction for the Glivenko-Cantelli Theorem. Therefore, R N > ρ(σ ) − ε for sure. Note now that the sequence of random variables R N is uniformly bounded by Lebesgue's dominated convergence theorem, E σ [|R N − ρ(σ )|] → 0 and we obtain convergence in mean.

Lemma A.4 The sequence of derivatives (ρ N (σ )) N exists and is uniformly bounded
Proof We use the expression for ρ N as in (17) and derive it w.r.t. σ . To this end, note that the random variable R N does not explicitly depend on σ ; we recall, moreover, that A simple calculation gives: Therefore, Concerning the integral, since Returning to (18), and noting that all the expressions on the r.h.s. are positive, we have Since ρ N (σ ) ≤ 1, R 2 N ≤ 1 for all N , and where K is a suitable constant, possibly depending on σ min and σ max but independent of σ .

Proof
We use the fact that if a sequence of continuous real-valued functions (F n ) n converges in L p for some p ∈ [1, +∞) on a closed and finite interval to a limit function F and, in addition, the sequence of derivatives F n exists and is uniformly bounded, then the sequence converges to F also uniformly. 21 We apply this result to the sequence (ρ N ) N . Convergence in L 1 on the disjoint intervals follows from Lemma A.3. The fact that the sequence of derivatives ρ N is uniformly bounded on the entire domain [σ min , σ max ] has been proved in Lemma A.4.

Lemma A.6
For all ε > 0 and for N large enough, Take σ = σ c + δ, δ > 0. Then follows from the fact that ρ N → ρ for any σ = σ c , and, finally, follows from the fact that lim σ →σ + c ρ(σ ) =r c . Returning to the statement of Theorem 3.3, to prove the well-posedness of the problem (P2), simply note that, according to Lemma A.2,f N is bounded, and, finally, lim σ →σ max f N (σ ) = M, with M > 0 large enough. The objective function is therefore continuous and bounded from below; hence, it admits a minimum and a minimum point σ * N . For the second part of the statement, we use the tool of the -convergence which, under suitable conditions, implies convergence of minimum values and minimizers, which in our case are also minimum points.
It is easy to see that the sequence of objective functions ( f N ) N defined on R + is equi-mildly coercive. 22 Moreover, given the function f (σ ) = kσ − ρ(σ ), we consider its lower-semicontinuous envelope sc f (see Definition B.4), that is, for every σ ∈ [σ min , σ max ], withr c as in (4). As seen in Lemma A.5, ρ N (σ ) converges uniformly to ρ(σ ) for any σ ∈ [σ min , σ c − δ] ∪ [σ c + δ, σ max ] for every δ > 0 (see footnote 20 (Page 24)). It is now immediate As a consequence, it converges uniformly on the set U := Then, by applying Proposition B.5, we obtain that It remains to study the value of -lim N f N (σ ) for the values of σ at the frontier of U , namely, σ ∈ {σ min , σ c , σ max }. We now show that at σ = σ c , Given that the -limit of a sequence of functions, if it exists, is necessarily lowersemicontinuous [see, e.g., Proposition 1.28 in Braides (2002)] and it is unique, sc f (σ c ), as in (19), is a good candidate to be the -limit value we seek. In the following, we prove that sc f (σ c ) satisfies both conditions (26) and (28) of the definition of -limit (see Definition B.2), hence the -lim N f N (σ c ) exists and is unique. 1) (liminf inequality). By way of contradiction, suppose that there exists a sequence On the other hand, since for every , f is continuous, then f (σ c ) ≤ lim inf j f (σ j ) for every sequence (σ j ) j converging to σ c . Such a condition holds, in particular, for the sequence (σ N ) identified above, and letting be large; therefore, we can write Applying Definition B.1 to the right-hand side of (20), we get We can now take growing as N , and by (21) where the last equality is precisely Definition B.1. Then, for growing as N , the inequality (20) becomes Therefore, As a consequence, for N large enough, Substituting the definition of sc f (σ c ) and f N (σ c ) in the previous inequality, it follows that or, equivalently,r c < ρ N (σ c ). This latter inequality is a contradiction due to Lemma A.6, hence the liminf inequality is satisfied. 2) (existence of a recovery sequence). We choose as a converging sequence σ N = σ c + 1/N and show that For (22) to be fulfilled, it is sufficient to prove thatr c = lim N ρ N (σ N ). To this end, we note that This result follows from the uniform convergence of ρ N to ρ in (σ c , σ max ] (inferred by the arguments following Eq. (19)) and due to the fact that lim N ρ(σ N ) =r c . Hence, the second requirement is also satisfied.
Concerning the -lim N f N (σ ) for σ ∈ {σ min , σ max }, we can proceed as in the case of σ = σ c . Indeed, by using both the regularity of ρ N and ρ and the convergence of ρ N to ρ, we get that -lim N f N (σ ) = sc f (σ ) for σ ∈ {σ min , σ max }.
In conclusion, we have ensured that -lim N f N (σ ) = sc f (σ ) for every σ ∈ [σ min , σ max ]. Renaming sc f (σ ) in (19) as f ∞ (σ ), we have that Then, relying on Theorem B.8, we get in our case Moreover, since all functions f N admit a minimum point σ * N (which exists by virtue of Lemma A.2 ), then, up to subsequences, σ * N converges to a minimum point of f ∞ . According to (19), the only minimum point of f ∞ is σ c . Hence, by (23), it follows that Accordingly,

B Some basics of 0-convergence
In this section, we introduce some abstract notions and results on -convergence. We start by recalling the concepts of lower and upper limits and of lower-semicontinuous functions to introduce the definition of the -limit. We also define the lowersemicontinuous envelope of a function and provide an example of computation of the -limit by noting how this can be different from the pointwise limit. Finally, we show that, under suitable conditions, -convergence implies convergence of minimum values and minimizers. From now on, unless otherwise specified, X will be a metric space equipped with the metric d.
Definition B.1 Let f : X → R. We define the lower limit (liminf for short) of f at x as Below, we report an example that highlights the different roles of the limsup and liminf inequalities. It is also useful to visualize in a simple case of a sequence of real functions the difference between the classical pointwise (or uniform) limit and the -limit.

Example B.6
Let f j (t) be a sequence of functions, where Indeed, the sequence f j converges pointwise (and hence also -converges) to 0 in R \ {0}, while the optimal sequence for t = 0 is t j = −1/ j, for which f j (t j ) = −1.

Definition B.7 (Coerciveness conditions)
Theorem B.8 Let (X , d) be a metric space, let ( f j ) be a sequence of equi-mildly coercive functions on X , and let f ∞ = -lim j f j ; then Moreover, if all functions f j admit a minimizer x * j , then, up to subsequences, x * j converges to a minimum point of f ∞ .

C Continuity of → ( )
where F has been introduced in Sect. 2. Then, for any σ ∈ , we can define D σ = {x ∈ X : G(x, σ ) ≤ 0} ⊆ X . With these new notations, ρ(σ ) as defined in Lemma 2.1 can be rephrased equivalently as ρ(σ ) = min x∈D σ x, because F(0; σ ) > 0 for every σ ∈ [σ min , σ max ]. Note that the relation σ → D σ is formally a correspondence, mapping each σ ∈ into a subset of X . We now introduce the definition of upper(lower)-semicontinuity for a correspondence as defined in Sundaram (1996) and prove a technical lemma. Definition C.1 A correspondence : → P(X ), where P(X ) denotes the power set of X , is said i) upper-semicontinuous (usc) at σ if, for all open sets V such that (σ ) ⊂ V , there exists an open set U containing σ , such that σ ∈ U implies (σ ) ⊂ V . We say that is usc on S ⊆ if it is usc for each σ ∈ S; ii) lower-semicontinuous (lsc) at σ if, for all open sets V such that V ∩ (σ ) = ∅, there exists an open set U containing σ , such that σ ∈ U implies V ∩ (σ ) = ∅.
We say that is lsc on S ⊆ if it is lsc for each σ ∈ S.

Lemma C.2
The correspondence σ → D σ is compact valued; moreover, it is both upper and lower-semicontinuous on the intervals [σ min , σ c ] and (σ c , σ max ]. Therefore, on the same intervals, it is continuous. Proof Compactness is easy to see, since D σ is a closed subset of X which is bounded. The closeness is due to the fact that D σ is the preimage of a closed set through a continuous function. We now prove upper-semicontinuity on (σ min , σ c ). To this end, fix σ ∈ (σ min , σ c ) and take any open V ∈ R containing D σ . Now define U = (σ − δ, σ + δ) ⊂ (σ min , σ c ), for δ > 0, and consider σ such that σ ∈ U . By way of contradiction, suppose that D σ V ; put differently, D σ ∩ V c = ∅. Then there exists x ∈ X such that x ∈ V c and G(x, σ ) ≤ 0. Since G is not constant and D σ is not a singleton, we can assume G(x, σ ) < 0. On the other hand, x ∈ V c implies x / ∈ V , hence x / ∈ D σ . Therefore, G(x, σ ) > 0. As a consequence, we can find ε small enough such that |G(x, σ ) − G(x, σ )| > ε; this latter inequality contradicts the continuity of G in σ , since, by assumption, σ ∈ U = (σ − δ, σ + δ). To prove usc for σ = σ min , we use the same argument, where now U = (σ min , σ min + δ), δ > 0. Similarly, for σ = σ c , we can take U = (σ c − δ, σ c ), δ > 0. The usc on the open interval (σ c , σ max ) is proved using the same argument, as well as the usc for σ = σ max , considering U = (σ max − δ, σ max ), δ > 0.
Concerning σ = σ c , by Lemma A.2, we know that there exist exactly two solutions to the equation G(x, σ c ) = 0; the smallest, x l , is such that x l < μ, whereas the largest one is x h > μ. Moreover, for all σ ∈ (σ c − δ, σ c ), G(x l , σ ) < 0 and therefore x l ∈ V ∩ (σ ) for any V such that x l ∈ V . Therefore, for such σ , the definition of lsc is guaranteed. Finally, lsc on (σ c , σ max ) is proved using the same argument as for the open set (σ min , σ c ), while the lsc at σ = σ max is obtained by considering U = (σ max − δ, σ max ).
To provide evidence that the correspondence σ → D σ is not continuous at σ c (from the right), we show that for σ = σ c the lsc fails when considering the open interval U = (σ c , σ c + δ). As stated, in case σ = σ c , there exist two solutions to the equation G(x, σ c ) = 0 such that x l < μ < x h . Consider now V such that x l ∈ V but V ∩[μ, 1] = ∅. In this way, V ∩ D σ c = ∅. Now define U = (σ c , σ c + δ), for δ > 0 and take any σ ∈ U . In this case, V ∩ D σ = ∅. 23 This contradicts lower-semicontinuity.

Proposition C.2
The map σ → ρ(σ ) is continuous on [σ min , σ c ] and on (σ c , σ max ]. Proposition C.2 is now a straightforward corollary of Theorem 9.14 of Sundaram (1996). The assumptions of that theorem are that the target function is continuous and that the constraint, defined through the correspondence σ → D σ , is compact valued and continuous. Our target function is clearly continuous and the assumptions on σ → D σ are ensured by Corollary C.2. For convenience, we report below the statement of Theorem 9.14 taken from Sundaram (1996), adapting it to our notation. Note that f (x, σ ) and f * (σ ), as in the statement of such a theorem, correspond, respectively, to the identity function x → x related to the first component and to our map ρ(σ ). Theorem C.3 (Theorem 9.14, Sundaram) Let f : X × → R be a continuous function and let D σ : → P(X ) be a compact-valued, continuous correspondence. Let f * : → R be defined by Then f * is a continuous function on .