The formation of efficient and inefficient social convention driven by conformity bias

Social conventions governs our social behavior in many ways, ranging from left- and right-hand traffic to a way of greeting. We sometimes find inefficient social conventions like bullying in a class are formed, where almost of the people in the group are at a disadvantage. Although such conventions can be disadvantageous for all the people in the group, why are those conventions formed and continue to be maintained? A conformity bias, behavioral tendency with which people take an action that a majority of the group take, can be one of key ingredients of this phenomena. In this study, we investigated the impact of the conformity bias to the formation of social convention with a multi-agent simulations. Analysing stationary states of the dynamics of the model, we found that the conformity bias can drive the formation of both of efficient and inefficient social convention depending on an extent of the bias.


I. INTRODUCTION
Social conventions govern our daily behavior in many ways. For example, when we drive a car in Japan, we keep left-hand side of the road as others do so. If we drive on the right-hand side, we must immediately have an accident; we never to do so. Such explicit or implicit rules governing our social behavior is called social convetion [1] [2].
Sometimes, we find inefficient social conventions are formed, where people in a group can not escape from such situations in spite of suffering a disadvantage from those [3] [4]. A bullying can be considered as one of example of such conventions. When a bullying occurs in a class, of course a student bullied is suffering a disadvantage. In addition, students who complicit bullying may be also suffering, if they unwilling to do so, for example, forced to join by the bully. Even for the bully, leading a bullying is potentially a disadvantage; a punishment from a teacher.
According to the discussion above, the class or group of students with a bullying is inefficient compared with a group without any bullying where no one is suffering a disadvantage. Although such conventions can be disadvantageous for all the people in the group, why are those conventions formed and continue to be maintained?
Recent studies show that a conformity bias can be one of a key ingredients of a formation of inefficient social conventions [2] [3] [5]. The conformity bias is a behavioral tendency with which people take an action that a majority of the group take. This bias can promote the formation of the social convention because of a positive feedback effect to the number of people to choose a certain aciton; the more the people who take a certain action, the more the number of the people taking the action increases. It is true the positive feedback effect promotes the formation of the social convention. The effect, however, promotes the formation of both the efficient and inefficient convention. Thus, we need to understand more detailed role of the conformity bias to the formation of the inefficient conventions.
Furthermore, it is also known that the conformity bias is varied through the age; previous studies showed that younger people displayed higher level of the conformity bias than older people [6] [7]. This result implies the conformity bias may play a significant role in the formaton of the inefficient social conventions, characteristic to younger people like bullying.
In this study, we investigated the impact of the conformity bias to the formation of the social convention with a multi-agent simulation. We hypothesized the conformity bias of each agent can promote the formation of inefficient social conventions. As a method to test this hypothesis, we constructed a game-theoretic model which have a reward function with positive externality and introduce the conformity bias as a characteristic of each agent. Analysing stationary states of the dynamics of the model, we found that the conformity bias can drive the formation of both of efficient and inefficient social convention depending on an extent of the bias.

A. Reward function
In our model, the agents play a repeated game defined as follows. In each stage game, the agents choose one of two options j ∈ {X, Y } and gain a reward following a reward function defined later. We assume a positve externality in a reward function: the more a number of agents who choose a same option with the agent i's choice, the more the agent i gets higher reward. Let N be the number of agents, n j (t) is a number of agents who choose an option j at time t, where n X (t) + n Y (t) = N . Then, we define a frequency of the agents in the population choosing an option j at time t as f j (t) := n j (t)/N . Based on these, we define a reward function as a linear functions on the frequency at time t as below: where X max and Y max are positive and they represent maximum values of rewards for the option X and Y , respectively (see Fig. 1).
Considering a possibility of formation of both an efficient and inefficient social conventions, we assumue X max > Y max ; X is relatively "good" option which yields higher reward compared with Y when all the agents choose a same option, i.e., f X (t) = 0 or = 1. We define a relative "badness" of option Y as follows: In this study, we set X max = 1. Then, a range of θ is given as θ ∈ (0, 1). On a stage game defined by N , j ∈ {X, Y }, and the reward function, the agents play an infinitely repeated game. As a result of playing this repeated game, it is expected a dynamics will converge to a certain stationary state. What kinds of the frequency f X and dynamics emerge depend on initial conditions and the parameters of the model (explained later). In the present study, we call those stationary states as social convention.
Assuming X max > Y max , we can see that a combination of choices of all the agents in which all of them choose X is an unique Pareto efficient combination (see Fig. 1). On the other hand, combinations other than the Pareto efficient one are all inefficient in the sense that, in such combinations, at least one agent reduce his/her reward compared with the one gained from the case of the Pareto efficient combination. Therefore, as the number of agents who chose Y in an emerged social convention increases, the social convention becomes more inefficient. In the present study, we use the term efficiency and inefficiency in this sense.

B. Biased reward
We introduce a conformity bias to our model with a following assumptions. First, we assume the agents' evaluation on r j (t) is positively biased by how many other agents choose an option j. Namely, the more the agents choose an option j, the more they put high value to the option j. Then, we assume this biased evaluation is proportional to the frequency f j (t) at time t and it's strength is represented by a conformity bias coefficient b. Moreover, we assume a biased reward is represented as a sum of the biased evaluation and the non-biased reward to the option j. According to the above assumptions, a biased reward r ′ j (t) is defined as Here, b ∈ [0, ∞) is the conformity bias coefficient of each agent which represents how extent the each agent's choice is biased by the choices of the others' in the population. If b > 0, the agents evaluate a reward gained by taking the option j higher than a non-biased reward, eq. (1) and (2), depending on a frequency of the option. Assuming X max = 1 and substituting the Eqs. (1) to (3) into Eqs. (4) and (5) yield following results.
C. Reinforcement learning model In our model, we assume the agents try to maximize his/her expected reward following a certain adaptive mechanism. As such a mechanism, we employ a reinforcement learning model. In the reinforcement learning model, an agent assigns a value to an option using a weighted sum of rewards gained by taking the option. They choose an option based on a probability conditioned with the weighted sum of rewards. This weighted sum of rewards is referred to as a value function and is updated according to the following rule; The Q j (t) is a value function for option j at time t. The r ′ j ∈ {r ′ X , r ′ Y } is a biased reward at time t. For an unchosen option, the Q j (t) is not updated. The α ∈ [0, 1] is a learning rate which specify an extent of updating the value function.
As mentioned above, we assume a decision making of the agents is stochastic. A probability choosing a option j ∈ {X, Y } given value function is defined as follows.
The β ∈ [0, ∞) is called an exploration rate which controls a randomness of a choice of each agent. If the β = 0, the choice is uniformely random. As β is close to ∞, the choice becomes completely deterministic.

D. Time-averaged frequency
In the present study, we are interested in which of the efficient or inefficient social convention are formed depending on given parameter set of α, b, and θ. As an observable, we introduce a time-averaged frequency as below: where T is a computional time for calculating the time average. Calculating the f X for each parameter set, we can evaluate how often both social convention are formed under given parameter set. If the f X is close to 1 at certain parameter set, the efficient social convention is likely to be formed, vice versa. In this study, we set T = 1.0 × 10 3 and used 50 different initial conditions for each parameter set. The number of agents N = 100. The exploration rate β is fixed to 3.0 and Q j (0) = 0 for all the agents.
III. RESULTS Fig. 2 shows results of our simulation. It is clearly observed when the conformity bias coefficient, b, is large enough, the inefficient social convention can be formed for a broad area of the parameter region. In particular for around b = 5.0 in the case of θ = 0.5, the inefficient social convention is formed for over 30% of initial conditions (Fig.  2, upper). Furthermore, the inefficient social convention can be formed even if the option Y is significantly inefficient, i.e., θ = 0.1 (Fig. 2, bottom).
Focusing on a parameter regime where the b is close to 0, we found another interesting behavior. When the b is close to 0, we can see the f X is less than 1. On the other hand, as b increases from 0, the f X monotonously increases and reaches to 1 at around b = 1.0 and then turn to decrease. In other words, there is a peak around b = 1.0 at which f X is nearly equal to 1.0. This means the conformity bias facilitate to form the efficient social convention under any initial conditions, if the extent of the bias is small enough. This behavior is observed in both case of θ = 0.1 and θ = 0.5. According to this results, we see the conformity bias can bring opposite outcomes depending on its strength.

IV. CONCLUSION
In this study, we investigated the impact of the conformity bias to the formation of social convention with a multiagent simulations. The model we constructed contained the reward function with positive externality and the conformity bias of each agent. We show that the conformity bias can drive the formation of both of efficient and inefficient social convention depending on an extent of the bias.
We assumed the conformity bias for each agent are homogeneous in the present study. The biases of each agent, however, vary each other in general, and a dynamics and an outcome emerged from an interaction among those agents can be more complex than the homogeneous case. Analysing such a complex situation is our future work and the results will be reported elsewhere.