Interaction times change evolutionary outcomes: Two-player matrix games

Two most influential models of evolutionary game theory are the Hawk-Dove and Prisoner's dilemma models. The Hawk-Dove model explains evolution of aggressiveness, predicting individuals should be aggressive when the cost of fighting is lower than its benefit. As the cost of aggressiveness increases and outweighs benefits, aggressiveness in the population should decrease. Similarly, the Prisoner's dilemma models evolution of cooperation. It predicts that individuals should never cooperate despite cooperation leading to a higher collective fitness than defection. The question is then what are the conditions under which cooperation evolves? These classic matrix games, which are based on pair-wise interactions between two opponents with player payoffs given in matrix form, do not consider the effect that conflict duration has on payoffs. However, interactions between different strategies often take different amounts of time. In this article, we develop a new approach to an old idea that opportunity costs lost while engaged in an interaction affect individual fitness. When applied to the Hawk-Dove and Prisoner's dilemma, our theory that incorporates general interaction times leads to qualitatively different predictions. In particular, not all individuals will behave as Hawks when fighting cost is lower than benefit, and cooperation will evolve in the Prisoner's dilemma.


Introduction
Most classic evolutionary games (e.g., the Hawk-Dove game (Maynard Smith, 1974) or the Prisoner's Dilemma (Poundstone, 1992)) assume an infinite population where individuals play pairwise games. The outcome of these games is described by a payoff matrix that allows to calculate the Nash equilibrium (NE), or an Evolutionarily Stable Strategy (ESS). The standard assumptions for these games neglect two important components. First, they neglect the opportunity cost of time lost while an individual is engaged in an interaction with its opponent. For example, in the case of the Hawk-Dove game this is the duration of the fight, in the case of the Prisoner's dilemma this is the time two individuals cooperate. One of the earliest articles developing evolutionary game theory (Maynard Smith and Price, 1973) does consider opportunity cost in a Hawk-Dove type game. There, additional payoffs are added to individuals who are engaged in shorter interactions. As we will see and as pointed out in the Discussion, their approach is different than ours.
Second, if we assume that the population is finite, time of the interaction changes the number of individuals that are available to play the game. Thus, to develop more realistic models of evolutionary game theory, one needs to consider changes in numbers of interacting pairs as a function of duration of interactions. So, we need to consider those individuals that are currently engaged in an interaction and those that are free to form new pairs to play the game. This introduces a complex feedback where duration of interactions influences the numbers of interacting pairs which, in turn, influences the game's NE or ESS.
In this article, we extend the matrix game theory by considering explicitly duration of conflicts between opponents. Similarly to the payoff matrix, we define the interaction time matrix that describes the duration of a conflict between any two elementary strategies. Animal fitness for matrix games is defined as the mean payoff an individual gets per interaction. Once duration of interactions is considered, this leads to two possible fitness definitions. First, we will define fitness as mean payoff per time. Second, we define fitness as the mean payoff per mean duration of the interaction. This latter concept of fitness is similar to the one that is used in optimal foraging theory (Charnov, 1976;Stephens and Krebs, 1986). We develop the theory of timeconstrained evolutionary games based on symmetric two-player games (i. e. matrix games) with two pure strategies and illustrate our results by applying them to the Hawk-Dove game and the Prisoner's dilemma.
The Hawk-Dove game models the evolution of aggressiveness. Animals are known to solve their conflicts in complex ways that may or may not include various display behaviors before the real fight http://dx.doi.org/10.1016/j.jtbi.2017.01.010 (Clutton-Brock and Albon, 1979;Sinervo and Lively, 1996). For example, Clutton-Brock and Albon (1979) observed the course of contests between male red deer. Out of 50 cases, only 14 ended in a fight that can be potentially lethal to one or both opponents. Two types of display behavior were observed: Roar contest and parallel walk. In total, no display behavior was observed in 10 cases that resulted in 1 fight. Roar contest only that lead to one stag withdrawal was observed in 16 cases, parallel walk only was observed in 7 cases (out of which 5 resulted in a fight), and a roar contest followed by the parallel walk was observed in 17 cases (out of which 8 resulted in a fight). This shows that there is a variability among individuals with respect to their aggressiveness. There are aggressive individuals that fight immediately without display, as well as individuals that are aggressive but they do display before fighting. Therefore, we will consider the interactions between two Hawks that can be longer (in the case of display), or shorter (without any display) than are the other interactions (i. e., between two Doves, or between a Hawk and a Dove).
We also consider the repeated Prisoner's dilemma where the expected number of rounds that two individuals interact depends on their strategy choices. For instance, if we assume that individuals can opt out of an interaction that is not beneficial, interactions between two cooperators will last longer (see the opting out game developed by Zhang et al., 2016). We will thus focus on duration of the interaction between two cooperators while we assume that all other interactions take the same amount of time. In particular, we ask here how long the interactions between two cooperators need to last for cooperation to evolve in this model.
We show that explicit consideration of interaction times in the above two games leads to qualitatively new predictions for their evolutionary outcomes. In particular, our approach leads to a different view on the evolution of aggressiveness and cooperation than provided by the classic Hawk-Dove and Prisoner's Dilemma matrix games respectively.

Two-strategy game with symmetric interaction times
Consider a symmetric matrix game with two pure strategies e 1 and e 2 and payoffs described by a payoff matrix (1) That is, π ij is the expected payoff obtained by e i in a pair-wise interaction with e j . Contrary to standard game models, we assume that all interactions take some time. These times are given by the interaction time matrix In what follows, we assume that interaction times are positive and the interaction time matrix is symmetric, i. e., τ τ = 12 21 . The payoff matrix (1) then provides the payoffs per interaction with each interaction taking time specified in (2).
To "solve" the game given by matrices (1) and (2), we need to describe the process of players' pairing as well as how individual fitness is related to payoff received. In this article, we assume that all singles immediately and randomly pair, so all individuals are paired. The numbers of pairs are denoted as n 11 , n 12 , and n 22 where the subindices denote strategies of the two paired individuals. In particular, n 12 is the number of pairs where one (irrespective if it is the first or the second individual) individual plays strategy e 1 and the other individual plays strategy e 2 . The overall (fixed) number of individuals is then N n n n = 2( + + ) 11 12

.
A pair n ij splits up following a Poisson process with parameter τ ij , i. e., in a unit of time, the number of pairs that disband is n τ / ij ij . So, per unit of time there will be n τ n τ 2 / + / 11 11 12 12 individuals playing strategy e 1 and n τ n τ 2 / + / 22 22 12 12 individuals playing strategy e 2 that will immediately form new pairs. The total number of individuals forming new pairs is n τ n τ n τ 2( / + / + / ) To obtain the number of newly formed n 11 pairs we multiply this proportion by the number of all newly formed pairs n τ n τ n τ / + / + / 11 11 12 12 22 22 . Similar considerations for n 12 and n 22 pairs lead to the following pair dynamics (3) We observe that at the equilibrium We remark that because n n n N 2( + + ) = 11 12 22 , Eq. (4) are dependent and to calculate the equilibrium we need to know the number of e 1 (or e 2 ) strategists in the population. Let n n n = +2 ). Assuming that τ τ τ ≠ 12 2 11 22 , Eq. (4) has two solutions for n ij in terms of n 1 for n N 0 ≤ ≤ . 1 However, one solution is never feasible in the sense that some coordinates are negative. The solution that has all coordinates positive is

Payoffs, fitness and evolutionary outcome
In what follows, we consider two methods to define individual fitness in terms of expected payoff. Both methods assume that payoffs are given out only when a pair disbands. The first approach assumes that fitnesses are calculated as the expected payoff per unit of interaction time while the second approach assumes that fitnesses are calculated as the expected payoff per expected interaction time. The corresponding time-constrained matrix game is then the twostrategy game with payoffs given by these two fitness functions evaluated at the equilibrium (5). Notice that, unlike the classic matrix games, these fitnesses depend nonlinearly on the proportions of the two pure strategies through the number of pairs. First, we look for pure strategy solutions to this time-constrained game. For example, suppose all individuals play strategy e 2 . Then n n = =0 11 12

Fitness is calculated as expected payoff per unit of time
and Π = π τ 2 22 22 . We ask when strategy e 1 can invade? Strategy e 1 can invade provided its invasion fitness is higher than is the fitness of strategy e 2 when alone. As the number of n 11 pairs tends to 0 much faster (convergence is of order n 1 2 ) than n 12 (of order n 1 ) when n 1 tends to 0, we obtain invasion fitness Π = , then e 2 is not a solution since it is not a NE. Second, we seek a mixed equilibrium consisting of phenotypes e 1 and e 2 in a polymorphic population. (The monomorphic case is discussed in Section 4.) In the polymorphic case, where n i individuals play strategy e i (i=1,2, n n N + = 1 2 ), the equilibrium must satisfy Π Π = 1 2 so that neither phenotype can increase its payoff by switching its strategy. Such an equilibrium corresponds to a mixed strategy NE. Together with (4) the population-distributional equilibria corresponding to a mixed NE are p n N B π τ π τ A π τ τ π τ π π τ π π τ π π τ π τ π τ π τ π τ π τ π τ τ where A π τ π τ π π τ π π τ π π τ τ π π τ π τ π τ = ( − ) + ( − ) We can analyze stability of the two equilibria p 1 ± . There are four possible cases classified according to the (in)stability of the pure strategies.
Case 1. Strategy e 1 is stable and e 2 is unstable ( > ).
Then either there is no mixed equilibrium, i. e., both p 1 + and p 1 − are outside the interval [0, 1] ( Fig. 2A), or both p 1 + and p 1 − are in the interval [0, 1] and then the smaller one is stable (Fig. 2B). ). In this case, exactly one of p 1 + and p 1 − is in the interval [0, 1] and it is stable (Fig. 2C). ). In this case, one of p 1 + and p 1 − is in the interval [0, 1] and it is unstable (Fig. 2D). There are two stable pure strategy equilibria. ).
Then either there is no mixed equilibrium (in the interval [0, 1]; Fig. 2E), or both p 1 + and p 1 − are in the interval [0, 1] and then the larger one is stable (Fig. 2F). We illustrate these general concepts for two important special examples now.

Hawk-Dove game
Here we apply the above result to the Hawk-Dove game (e H = 1 , e D = 2 ) with the payoff matrix where the value of the resource is 2V, the individual cost of fighting is C, and the interaction time matrix is This model assumes that all interactions except those between two Hawks take the same time τ. The interaction τ 11 between two Hawks can be either longer, or shorter that τ. A larger τ 11 models Hawks that display (for the common time τ) before they fight. A smaller τ 11 means that Hawks do not display before fighting. Both these situations have been observed and reported in the literature (e. g., Clutton-Brock and Albon, 1979). We will consider two cases that depend on the parameters C and V. we are in Case 1 and from (7) we get that for that are between 0 and 1. These equilibria are shown in Fig. 3A. The arrows show directions in which fitness increases. Thus, we observe bistability where the all Hawk equilibrium is always locally stable, and provided the interaction time τ 11 is long enough, the interior equilibrium p 1 − is also locally stable. We observe that the region of local stability for the all Hawk equilibrium decreases as the interaction time between two Hawks increases.
Case B (C V > ). Now assume that the cost of fighting is high compared to the value of the resource. Since = <0 is not a NE. In fact, this shows that we are in Case 2 (Fig. 2C), that there is only one NE and that it is a mixed stable equilibrium. From (7) we get that the equilibrium between 0 and 1 is p 1 − from (10). Dependence of this equilibrium on interaction time between two Hawks is shown in Fig. 3B. Fig. 3B also shows that the equilibrium frequency of Hawks is at its maximum value of V/C when we are in the classic case where all interaction times are the same (i. e. p τ ( ) 1 11 as a function of τ 11 has a maximum at τ τ = 11 ). In fact, this equilibrium frequency first increases from . When τ 11 is short, Hawk-Hawk pairs will disband fast and these Hawks will quickly pair with another Hawk, which decreases their fitness, or another Dove, which increases their fitness. As can be shown, the balance between these two effects leads to most Hawks involved in Hawk-Dove contests when τ 11 is short as the frequency of Hawk-Hawk contests, p 11 , is close to 0.  (7) for two-strategy matrix games with interaction times. Solid circles at the endpoints (i. e., at p = 0 1 or 1) are strict NE and so ESS whereas empty circles are unstable. Interior circles are NE that are either stable and local ESS (solid) or unstable NE (empty). The four cases in Section 3 correspond to panels A and B (Case 1), panel C (Case 2), panel D (Case 3) and panels E and F (Case 4).

V. Křivan, R. Cressman
Journal of Theoretical Biology 416 (2017) 199-207 When τ τ = 11 , the frequency of Hawks p 1 in the population is V/C, i. e., we recover the standard result of the Hawk-Dove model. As the interaction time increases further on, the proportion of Hawks in the population decreases because Hawks are losing too much time in their fights. For large τ 11 , p 1 − tends to 0 as seen in Fig. 3B.

Repeated games (Prisoner's dilemma)
Interaction times also play an important role in repeated two-player games where it is typically assumed that there is a fixed probability ρ that there will be the next round of the game. This probability ρ is not under the players' control. That is, the expected number of rounds is ρ 1/(1 − ). Assume that each player uses the same single-round pure strategy e i for the entire interaction with its current partner, that the expected number of rounds of the interaction between e i and e j is τ ij , and that payoffs from each round are cummulative (i. e., the expected payoff per interaction for strategy e i against e j is τ π ij ij where π ij is the payoff in the single-shot game). With random pair formation among free individuals between rounds, the corresponding discrete-time process for the numbers n ij of pairs e e i j at round t has the same equilibrium (4) as the continuous-time process (3) of Section 2. The solution to the time-constrained game is then given by applying the general theory developed above to the adjusted payoff per interaction matrix with interaction time matrix (2).
Consider the repeated Prisoner's dilemma game (PD) where payoffs of cooperators (C) and defectors (D) for a single round are given by the simplified version of the PD game (Pacheco et al., 2006); namely, where b is the benefit the cooperator provides a defector at a cost c to himself. Since it is assumed that b c > > 0, any player prefers to play against C rather than against D. Thus, if each player can decide whether to continue his interaction to the next round, he should play only one round against D and as many as possible (i. e. continue until the interaction ends after an expected number of rounds ρ 1/(1 − )) against C. That is, τ τ = = 1 12 22 and τ ρ = 1/(1 − ) > 1

11
. This models what is known as the opting out game (Zhang et al., 2016).
In fact, we will consider a more general model with symmetric interaction time matrix (2)  where cooperators survive with defectors. Both p 1 − and p 1+ exist and are between 0 and 1 for Furthermore, in this case, equilibrium p 1+ is stable (Fig. 2F). Thus, if the time (i. e., the number of rounds) two cooperators continue to interact is large enough, stable coexistence of cooperators and defectors is possible (Fig. 4A). In particular, if interaction times between two defectors and a defector-cooperator pair last for the same time τ, this cooperation evolves provided This result, which cannot occur in the classic repeated PD game, was also found by Zhang et al. (2016) and related there to the results of game experiments when players were allowed to opt out.
On the other hand, if defectors stay together a shorter time than is the common time the other interactions last (i. e., τ τ τ τ = = > 11 12 21 22 ) inequality (14) does not hold and there is no interior equilibrium. In this case, defection is the only ESS of the game.

Fitness is calculated as expected payoff per expected time
This fitness is calculated as the average payoff an individual of a given phenotype obtains per expected time of the interaction. For example, let us consider an individual playing strategy e 1 . From the decision tree in Fig. 1, the average payoff this individual gets is π π + . To obtain an individual fitness we divide the average payoff by the average time, which yields Π n π n π n τ n τ Π n π n π n τ n τ = 2 + 2 + and = 2 + 2 + .
Once again, strategy e i will be a strict NE provided π τ π τ / > / ii ii ji ji (i, j=1,2, i j ≠ ). That is, (in)stability of the pure strategies are given by the four cases of Section 3.1. The interior population-distributional equilibrium can be analytically calculated using e. g., Solution function of Mathematica 11 which provides up to two solutions p 1 in [0, 1]. Qualitatively, their stability is again given by Fig. 2 with the same four cases as in Section 3.1. These expressions are too complex for analysis but they simplify for the Hawk-Dove and the Prisoner's dilemma games.

Hawk-Dove game
The analogue of the interior equilibria (10) for the Hawk-Dove game when fitness is given by (15) are where Qualitatively, this follows the two cases where fitness function is given as the payoff per interaction time of Section 3.1.1. For C V < (Case A), Doves cannot invade the all Hawk population, so that all Hawk is a stable NE. Moreover, provided interaction time between two Hawks is long enough, there are again two interior equilibria with the smaller one being stable (Fig. 3C).
For C V > (Case B), solution p 1 + is outside the interval [0, 1] and the only stable solution is p 1 − . Fig. 3D shows the dependence of p 1 − on the fighting time τ 11 . We observe that as τ 11 tends to 0, p 1 − tends to V C C ( + )/(2 ) which is a higher equilibrium frequency of Hawks than the standard model (i. e. when all interaction times are equal). The equilibrium frequency of Hawks is now a decreasing function for all τ > 0 11 . In particular, in contrast to Case B of Section 3.1.1, it can be shown that the frequency of Hawk-Hawk pairs no longer approaches 0 as τ 11 decreases. Also, as the interaction time between two Hawks increases beyond τ τ = 11 , the proportion of Hawks in the population decreases much faster when compared to Case B of Section 3.1.1 (Fig. 3, panel D compared to panel B).

Repeated Prisoner's dilemma
For payoffs (12), the formula for p 1 ± when interaction times are arbitrary is more complex under fitnesses (15) than under (13) and so is omitted here. However, it can be shown that p 1 ± are both between 0 and 1 if and only if As we are in Case 4 of Section 3 and it can be proved that . V. Křivan, R. Cressman Journal of Theoretical Biology 416 (2017) (15) and the local ESSs correspond to the solid circles in Fig. 2.

Discussion
We developed a new approach to the theory of two-player symmetric evolutionary games with two strategies that explicitly considers duration of interactions between players. When applied to the Hawk-Dove and Prisoner's dilemma games, this theory makes new evolutionary predictions. In particular, it shows that in the Hawk-Dove game non-aggressiveness can evolve even when the cost of fighting is low provided interactions between two Hawks take long enough time. Similarly, for the Prisoner's dilemma, when interaction time between two cooperators is long enough, cooperation can evolve. These novel predictions will change our way of thinking about evolution of aggressiveness and cooperation.
The theory developed in this article is based on symmetric twoplayer games (i. e., matrix games) with two pure strategies with symmetric interaction times, i. e., interaction time of a couple where the first individual plays strategy 1 and the second individual plays strategy 2 is the same as is the interaction time for a couple where first individual plays strategy 2 and the second strategy 1. In this article, we assume that pairing between individuals is random and instantaneous, so all individuals are paired. This assumption simplifies bookkeeping and leads to analytic results (most of calculations were done in Mathematica 11). Fitness is gained upon pair disbanding. For classic matrix games, fitness is defined as the average payoff an individual receives in an infinite population of players. There are two complications that must be dealt with once interaction times are explicitly considered. First, one needs to define fitness anew. In this article, we consider two fitness functions, one assuming that fitness is measured instantaneously (i. e., per unit of time), the other, motivated by the optimal foraging theory (Charnov, 1976;Stephens and Krebs, 1986;Křivan, 1996), assumes that fitness is measured as the average payoff an individual obtains from a random interaction divided by the average time spent in a random interaction. Second, as interaction times are considered explicitly, one needs to keep track of the number of all couples. In this article, we describe these dynamics by differential equations assuming that pair disbanding is described by a Poisson process. In principle, this means that pairing is asynchronous in time. 2 Here we ignore the non generic case where Π Π = 1 2 at the pure strategy.
V. Křivan, R. Cressman Journal of Theoretical Biology 416 (2017) 199-207 In the classical Hawk-Dove model (Maynard Smith and Price, 1973), the prominent example to model and explain evolution of aggressiveness, fights are assumed to be time consuming, but this is not captured by the model, where all interactions take the same time. However, there are many documented examples where interactions between two individuals take different times that depend on the individuals' phenotypes. In particular, Clutton-Brock and Albon (1979) (see also Maynard Smith (1974)) provide an example of contests between male red deer. In that contest, some individuals do display while some others do not which changes the time individuals interact. Similarly, Sinervo and Lively (1996) observe three phenotypes of side-blotched lizard with different territorial behaviors. While orange-throated males are aggressive and often fight without any display, blue-throated males spend a lot of time challenging and displaying, before a possible fight. It is thus clear that these phenotypes spend different times in their interactions, which has an effect on their fitness. Indeed, in this article, we show that varying the time two Hawks interact crucially influences the evolutionary outcome for the Hawk-Dove model. In particular, Fig. 3 shows that when cost of fighting is smaller than the reward of winning a fight, and the fighting time is long enough, there are two locally stable equilibria. The first equilibrium at which all individuals play Hawk strategy corresponds to the classic model with all interaction times equal. However, the other, mixed equilibrium, corresponds to the case where both Hawks and Doves coexist in the population. As the time of fight increases, the region of attractivity of this interior equilibrium increases so it is more likely to occur. This result provides a new explanation why nonaggressive behavior occurs among individuals even when the cost of fighting is small.
As mentioned in the Introduction, Maynard Smith and Price (1973) also incorporate opportunity cost into a Hawk-Dove type game. Specifically, in their computer simulations of multi-round interactions between pairs of individuals, these individuals receive a payoff from the interaction as well as an additional payoff that decreases as the number of rounds increase. Thus, in contrast to our model, the opportunity cost in their model is independent of the strategy that the individual uses in future interactions. Moreover, in our terminology, their fitness is payoff per interaction and so does not take account of the interaction time. Despite these differences with our approach, it is noteworthy that they also find that the population does not consist entirely of Hawks when the probability of serious injury in a fight is low.
In the case of the repeated Prisoner's dilemma, we show that, provided cooperators stay together for enough rounds of the game while the other possible pairs disband quickly, cooperation does evolve (Fig. 4). These assumptions are quite realistic, especially if players can choose whether to continue the game to the next round with the same opponent, since it is always better to play against a cooperator than a defector in the Prisoner's dilemma game. Our model thus provides a different mechanism than others (Nowak, 2006) that lead to the evolution of cooperation. On the other hand, our mechanism is similar to models based on direct reciprocity that require the probability of next encounter between two cooperators is higher than the cost to benefit ratio. This probability condition is often satisfied through non random pair formation processes in a well-mixed population (Taylor and Nowak, 2006) or in a structured population where individuals interact with neighbors in a graph (Pacheco et al., 2006). In our model with random pair formation, provided the interaction time between two cooperators is long enough when compared to common interaction times between other pairs, cooperation evolves. We also analyzed the case where all interactions except those between defectors take the same time. These two situations are substantially different, because while the first case assumes that both individuals must be willing to pair, the second approach assumes that a pair will continue their interaction unless both want to disband. The evolutionary outcomes are substantially different as well. While in the first case we showed that high enough cooperation times lead to cooperative behavior in the population, in the second case this is not so.