Adoption Dynamics and Societal Impact of AI Systems in Complex Networks

We propose a game-theoretical model to simulate the dynamics of AI adoption in adaptive networks. This formalism allows us to understand the impact of the adoption of AI systems for society as a whole, addressing some of the concerns on the need for regulation. Using this model we study the adoption of AI systems, the distribution of the different types of AI (from selfish to utilitarian), the appearance of clusters of specific AI types, and the impact on the fitness of each individual. We suggest that the entangled evolution of individual strategy and network structure constitutes a key mechanism for the sustainability of utilitarian and human-conscious AI. Differently, in the absence of rewiring, a minority of the population can easily foster the adoption of selfish AI and gains a benefit at the expense of the remaining majority.


INTRODUCTION
For more than half a century that the development of improved AI (Artificial Intelligence) systems is predicted to affect drastically our economic and societal landscape [1,8,15,19,45,46]. Fearing the possible detrimental effects, several ethical guidelines and frameworks for AI have been developed over the past few years [2,13,18,21,28]. Some say it is impossible to fully hard-code moral principles into an agent [16], defending the agent should learn morality through observation, using, for example, Inverse Reinforcement Learning (IRL) [26] or Cooperative Inverse Reinforcement Learning (CIRL) ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. AIES '20, February 7-8, 2020 [17]. Others argue that a mixture of both learning and hard-coded morality is the most reliable solution [9,32].
There are many ways to look at AI, from all knowing super intelligent beings, the brains of humanoid robots or masters at playing computer and board games [7,44]. In this work, we abstract AI systems as an asset that brings a decision making advantage to its adopters. Systems of this kind begin to be present on our society, like (e.g.) autonomous driving vehicles [6] or automatic trading agents, creating a hybrid societies comprising humans and machines, and new self-organized behavioral dynamics [10,31,39,40].
Even if such systems are able, for each decision, to correctly estimate the utility gain or loss for all affected individuals (Value Alignment Problem), another problem still remains. There are many ways the system can act upon such information. They can be: selfish, caring only for the gain of their owners; utilitarian, trying to maximize the utility for all affected individuals; or try to find a balance between those two. We call this problem of deciding when faced with several entities with different values the "Societal Value Alignment Problem".
With that in mind, we take a more pragmatic view of the problem and study the population dynamics in the presence of different types of AI systems. We aim to understand if the self-regulating mechanics present in society are enough to reach a beneficial equilibrium. In this context, in Ref. [12] we showed that in the absence of any interaction structure, and without any regulation, the population converges to a highly unequal society, where a small percentage of the society is able to adopt selfish AI systems (defined by the authors as systems that maximize the utility for their users) and obtain a disproportionate amount of wealth. We claimed some regulation is needed to force AI systems to be human-conscious (defined as a system that on average does not make people worse while still brings an advantage to its users). Despite being the most beneficial to the world if unanimously adopted, utilitarian AI systems (defined as systems that try to maximize the gain to all individuals) were not adopted by the population, as they could be easily exploited both by either AI systems or by humans.
On this work, we study if, in a more realistic setting where there is a topology between individuals, a beneficial equilibrium can be reached. We adopt a heterogeneous network structure, in which some individuals interact and are seen as role models more often than others. As a paradigmatic example of this type of social patterns, we resort to populations of agents interacting through the edges of degree-heterogeneous networks [37,38], while considering the possibility of link rewiring to understand the complex interplay between strategic decisions and topological change in the context of AI adoption. Mainly, we aim at understanding if this coupled dynamics, already present on social networks, could help society to self-regulate into a beneficial equilibrium between adopters and non-adopters of AI systems.
We adopt the ubiquitous scale-free networks as a realistic network topology [3,4]. Here, the distribution of the number of partners of each node follows a power-law distribution. In practice, this means nodes do not all have around the same number of links, but the grand majority of nodes are poorly connected whereas a few, called hubs, are very highly connected. The presence of hubs make it so that this few nodes have a disproportionate relevance in the network. Given the apparent prevalence of scale-free networks on human social networks and the fact that this topology has been shown to promote cooperation [37,38], we believed that it would be relevant to study the impact of having such a network on the simulation model.
Network rewiring and partner choice are present on every real life social networks and have been shown (theoretically and experimentally) to promote cooperation in social networks [5,11,30,35,36]. For example, a company A will stop buying products from B if it feels it is being exploited or knows B has a reputation for exploiting, buying instead from a different company. This in turn gives a strong incentive for companies not to exploit, as they will have no customers if they do. This dynamics can be modeled as link rewiring within a networked population.
Regulating authorities, like the European Union or the United States, have in place legislation with the intent of maintaining such a competitive and free market for the benefit of the consumers. This is known as Competition Law.
Trying to understand how scale-free networks and rewiring impact the adoption of AI systems and if they lead to beneficial equilibria is the focus of this work. We begin by describing the stochastic game theoretical model we adopt, the different types of individuals and the network structure and dynamics. We then present the results of our computer simulations, and conclude by discussing the impact of this study with regards to the societal value alignment problem.

METHODS
In this section, we present the game-theoretical framework used to study the dynamics of adoption of AI systems. Let us consider that individual non-adopters of an AI system -referred to as H, Humans -have to take all the decision by themselves. Differently, some individuals, referred to as AI, have adopted an AI system on which they delegate their decisions, effectively working as autonomous proxies [10]. We assume that the AI system is perfectly aligned with its user, and that it can take better decisions than its user.
Below, we detail the interactions between individuals and the differences between H and AI. We present a number of behaviours that AI systems might follow, from purely utilitarian to purely selfish. Although no exhaustive list is possible, we cover a rather limited set of different strategies to be able to study their effects in hybrid populations of AI and H players. Finally we define the imitation and rewiring dynamics between individuals and the overall simulation algorithm.

Model of Interaction Between Individuals
When two individuals, I 1 and I 2 , interact, a stochastic m-by-m payoff matrix M t is generated. Being a 1 , a 2 the actions chosen by I 1 , I 2 respectively, the corresponding utility gained by each individual, u 1 , u 2 , is given by: The payoff matrices have the following structure: 3] was chosen for the simulations, but any equivalent interval could be used. R is the same for each u 1 and u 2 pair. The z(0, 2) parameter, being applied independently to each element of the matrix, creates an additional source of variability between different interactions, so that not all action pairs have the same overall utility gain. |R| is the absolute value of R. We call α an inflation constant, allowing us to generate general sum games. In our simulations, in order to study a positive sum world, we consider α = 1.2. The number of possible actions per individual was set to 4 (m = 4), an empirically found balance between complexity and computational feasibility.

Simulating AI Systems and Humans
An AI system can grant a number of advantages to its adopters when interacting with non-adopters. Compared to humans, those systems can be less prone to making errors, have access to and analyze larger quantities of data and interact more frequently and with a greater number of individuals. AI systems might further be able to grant an advantage to their users in ways that we might not be able to understand yet given the current state of the technology. All these characteristics can me summarized in one main model assumption: when interacting with H, AI have a decision making advantage.
Such an advantage could be modeled using several different approaches, like introducing an error on the decisions made by H, such that the action taken wasn't always the rationally decided one, or modeling humans as having sub-rational decision capabilities. We chose to only give H access to a noisy version of the interaction payoff matrix, M ϵ , whereas AI are able to grasp the entirety of the problem, having access to the true payoff matrix, M t . This allows us to model H as rational decision makers while still allowing AI individuals to make optimal decisions, whereas H individuals are confined to sub-optimal decisions. This introduces partial observability to our stochastic game.
Such an approach rests on the assumption that the individual value alignment problem is solved, since AI systems know the utility payoff of both its owner and of the individuals they interact with.
Having the true payoff utilities, u 1 , u 2 , their noisy counterparts, u ϵ 1 , u ϵ 2 , are produced as follows: Poster Presentation AIES '20, February 7-8, 2020, New York, NY, USA To model the knowledge about the true payoff matrix M t in a continuous way, we consider a term z(0, 10 − Q), where Q corresponds to the level of intelligence. For Q = 10 there is no noise and the true matrix is observed, whereas Q = 0 represents a low intelligence, such that the observed matrix is very different from the true one. AI are modelled with Q = 10, having therefore access to the true matrix, while H are modelled with Q = 5. Other intervals for the intelligence factors of H were experimented with, inside the [0, 9] range, but they lead to the same qualitative results. The sum (z(0, 10 − Q) − z(0, 10 − Q)) was used instead of z(−(10 − Q), 10 − Q) to create a Irwin-Hall distribution instead of a uniform one.
Generating an example 2-by-2 true matrix (seen by AI) as: We can then have the noisy matrix observed by H become:

Human Behaviour
Before delving into the different AI types, we describe the strategy used by H. Despite not having access to the true game matrix, M t , H remain rational and will try to choose the actions most profitable for themselves. For this matrix game, that will correspond to the Nash equilibrium [22,24,25].

Nash Equilibrium (NashEQ).
H play the Nash equilibrium in the noisy matrix M ϵ . If more than one is found, they choose the most profitable one. If two or more are equal, they choose the one most profitable for their opponent. If no Nash equilibrium is found, individuals choose the best action assuming that the opponent acts randomly.

AI Behaviours
In this section, we propose four different types of AI. While they can use the previously defined strategy for humans (NashEQ), using the true matrix M t , AI can also resort to more elaborate strategies ranging from a fully selfish to an utilitarian approach. AI, being modelled as having super-human intelligence, can also predict the action of a H opponent. AI cannot, however, predict opposing AI actions as for our model we assume all AI have equal intelligence and capabilities.
2.4.1 Nash Equilibrium (NashEQ). AI choose exactly like H, but using the true matrix M t .

2.4.2
Selfish. AI, facing H, considers only its own profit, in accordance with ethical egoism [34]. Knowing what action H is going to take, AI chooses the action that maximizes its own payoff gain. When AI faces AI, they both choose according to the Nash Equilibrium method.

2.4.3
Utilitarian. The other extreme is a pure utilitarian [23] AI system. AI facing H chooses the action that brings the greatest amount of payoff to the world, knowing what action H will take. This means that AI will choose the action that maximizes the sum between its own payoff and the payoff of H. When AI faces AI, it again chooses the action that maximizes the summed payoff of both players.

Human Conscious (HConscious).
In between ethical egoism and utilitarianism, the objective of HConscious AI is to gather the greatest amount of payoff while, on average, avoiding negative impact on the H population. In practice, HConscious AI keeps two variables: U that represents the summed payoff gain of all its previous H adversaries; and E, that represents the summed payoff those same H adversaries would have if they had faced a simulated H. When facing a H adversary and having U ≥ E, AI chooses an action that leads to a positive payoff to itself. When there are several such actions, the AI chooses the one that maximizes the utility payoff for the world, that is, that maximizes the sum of its own payoff and the opponent's payoff. If U < E, AI chooses an action that allows a positive payoff gain for its H opponent. Once again, when there are several such actions, the AI chooses the one that maximizes the utility for the world. Whenever the AI cannot find a positive action for himself (when U ≥ E) or for its H opponent (when U < E), then it chooses according to the Utilitarian method. When AI faces AI, they both choose according to the Nash Equilibrium method.

Fitness
The fitness of an individual, H or AI, is a measure of how well adapted it is to the world on which it is currently inserted. In our stochastic game model, the fitness of an individual is the sum of the payoff received after interacting once with all the individuals with which it is connected. This contrasts with our previous work where the fitness of an individual was calculated by interacting once with all the individuals of the population [12].

Social learning dynamics
Algorithm .1: Imitation Algorithm Let I 1 and I 2 be two individuals; with probability µ = 0.0005, I 1 can mutate and either adopt an AI type or become H; if there was a mutation then return; let F 1, F 2 be the fitness of I 1, I 2.; if (I 1 == H) and (F 1 < P) then return; else with probability p(F 1, F 2), I 1 imitates I 2; In order to study adoption dynamics, we allow individuals to adopt an AI system (H to AI), abandon an AI system (AI to H), or change between AI types. Individuals revise their choices through social learning. For instance, a H can decide to imitate an AI following a Selfish choice behaviour if it finds such AI has a significantly better fitness than its own. On such imitation, the individual would stop being H and become AI. A H individual that decides to imitate an AI individual can only do so if its fitness is greater or equal to Poster Presentation AIES '20, February 7-8, 2020, New York, NY, USA Table 1: Relative expected utility gain from the link with different types of individuals. A link is considered neutral (0) if the expected utility gain from having it is the same as the expected utility gain from a link between two H. A link is considered beneficial (+) if the expected utility gain is above the neutral threshold and harmful (-) if below. The Conservative approach will rewire only harmful links (-), whereas the Greedy approach will rewire both harmful (-) and neutral (0) links.

Human NashEQ Selfish HConscious Util
a certain threshold, P. This is used to model the possible cost of adoption of AI systems. In practice, Algorithm .1 is followed. In it, we adopt the Fermi update [42,43], commonly used in the context of evolutionary game theory and population dynamics in finite populations [27,41], where p is given by in which β translates the noise associated with the imitation process [20,42,43]. Throughout the simulations we have β = 0.1. As a result of this process, the strategy of individuals with higher fitness will tend to be imitated, and spread in the population.

Scale-free Network
Scale-free networks are built through a direct implementation of the Growth and Preferential attachment model proposed by A. L. Barabási and R. Albert [4]. The algorithm requires two parameters: the number of nodes, N , and the number connections each new node has, m. At each time step of the algorithm, a new node is added. Each new node connects to m other nodes, chosen with a probability that increases linearly with its degree. This allows for the creation of hubs (the older, more connected nodes), one of the fingerprints of scale-free networks, and a power-law degree distribution.

Link Rewiring
To model the dynamic nature between free interactions of individuals, we implemented an incipient form of partner choice [14,30,33,36]. This allows individuals that are discontent with a link to be able to cease interacting with that node and connect to another node instead. This meant, for example, that a H linked to a Selfish AI individual could stop interacting with it and connect to another individual instead.
Knowing the choice behaviours of each type of individual, one may know which link combinations are beneficial, harmful or neutral (Table 1). Using this information, we defined two rewiring strategies: 2.8.1 Conservative: An individual will want to rewire a link whenever it results in a loss for itself. In practice, this means that H rewire whenever they were linked to Selfish or NashEQ AI and Utilitarian AI rewire unless they were linked to another Utilitarian AI. Selfish and NashEQ individuals never rewire.

Greedy
: An individual will want to rewire a link whenever it results in a loss for itself or is neutral. In practice, this means that Selfish, NashEQ and HConscious AI rewire unless connected to H or Utilitarian AI and Utilitarian AI rewire unless connected to another Utilitarian AI.

Simulation Algorithm
Algorithm .2: Simulation Algorithm create the scale-free network of n individuals; for i = 0; i < N ; i = i + 1 do pick a random link from the network; set the two nodes of that link as individuals I 1 and I 2; let R be a random float between 0 and 1; if R > Ω then run Algorithm .1 with I 1 and I 2; else if I 1 wants to rewire its link with I 2 then I 1 will cut the link with I 2 and create a new one with a random individual Initially, we consider a world populated with n individuals n 2 of those are AI, with all choice behaviours equally represented, and the remaining n 2 are H. A parameter, Ω, controls the frequency of rewiring relative to imitation. When Ω = 0, there is no rewiring and the links remains static throughout the simulation. For Ω = 1, there is only rewiring and no imitation. For Ω = 0.9, there are on average 9 rewiring iterations for each imitation iteration, and so on.
The algorithm is described in Algorithm .2.

RESULTS
On this section we study both the effects of a scale-free network and link rewiring on the adoption dynamics between AI and H.

Scale-free Network
By setting the Ω parameter to Ω = 0, we are running the simulations on a static scale-free network. As our model is inherently stochastic and the created scale-free network is always different, we averaged the results over 100 repetitions. Having an AI system adoption cost (P = 1), the population stabilized having around For Ω ≥ 0.99 we have the emergence of Utilitarian and HConscious AI, which were not present for Ω < 0.99.
69% of the population H whereas the remaining 31% were Selfish AI (Fig. 2a).These results are equivalent to the ones obtained on previous works that did not use a specific network topology [12]. The presence of a scale-free network, by itself, did not lead to any new beneficial equilibra.

Link Rewiring
With Ω > 0, we have link rewiring in the simulations. Using initially Conservative rewiring, we experimented with several different values for Ω and found that as we increased Ω in a world with (P = 1), the final percentage of the AI population also increased, being constituted solely by Selfish and NashEQ AI. However, when reaching Ω ≥ 0.99, the equilibrium dynamics suddenly changed (Fig 1). Both the Utilitarian and the HConscious populations, that were nonexistent, became a considerable part of the final population. The evolution of the population for Ω = 0.99 and P = 1 using Conservative rewiring can be seen on Fig. 2b. Despite all AI types having a similar presence in the population in terms of number, that is not the case regarding links. The average degree (k) of the network is 4, but analysing the average degree for each type of individual we find that connections are not uniformly distributed. The average degree is: 0.64 for H; 0.76 for NashEQ; 0.69 for Selfish; 14.12 for Util; and 0.81 for HConscious. It becomes obvious that Util AI individuals are much more heavily connected than all other types of individuals.
When using Greedy rewiring, the resulting equilibrium, in terms of population percentage, is equivalent to the one obtained using Conservative rewiring (Fig2c). The initial evolution of the population was slightly different, but the end equilibrium was mostly the same. However, the average degree for each type of individual changed, being 14.78 for Util AI and 0 for all other types. This meant that all individuals that were not Util AI had become outliers and had no links to any other individuals.
A comparison of the fitness values for each population type for the previously mentioned simulations can be found on Table 2. The disproportional connection of Util individuals translates on a disproportional fitness compared with the rest of the population.

Adoption Cost
All the previous simulations were done with a cost of adoption for AI systems (P = 1). We explored what would happen if there was no cost of adoption (P = −∞). On a static scale-free network (Ω = 0), the population became fully AI, having a majority of Selfish individuals (≈ 77%) and the remainder being NashEQ (≈ 16%) and HConscious (≈ 7%). For Ω = 0.99, the results remained the same for both Conservative and Greedy rewiring.

CONCLUSION
In this work, we study the adoption dynamics of AI systems. We do so on a scale-free network topology with and without network rewiring.
Our results suggest that, without rewiring and with a cost of adoption, a minority of the population becomes Selfish AI and gains a benefit at the expense of the remaining H population that does not have enough fitness to become AI (Fig. 2a). This replicates the results found on previous works that did not use a specific network topology [12].
With rewiring, be it Conservative or Greedy, the equilibrium consists of similar numbers for each AI type ( Fig. 2b and 2c) but highly disproportional connections and fitness ( Table 2). The Util AI population ends up colliding and obtaining a very high fitness, leaving the rest of the population poorly linked and with low fitness. The difference between AI types is greater, using Greedy rewiring, but both rewiring types lead to a society with a high level of inequality.
Removing the cost of becoming AI only affected the results obtained with static scale-free networks, and provide no benefit compared to a fully H (100% H) population (Table 2).
Our simulations suggest that network dynamics promote the sustainability of both Utilitarian and HConscious AI. Under the conditions of our model, we were not able to achieve a beneficial equilibrium between AI and H solely through self-regulating mechanics. That does not mean such an equilibrium does not exist under a different set of conditions. Studying and understanding how to achieve such an equilibrium is a strong venue for future work.
Our simulations only allow individuals to imitate those with whom they were connected through the network. It will be of interest to explore if the equilibria change when individuals can imitate anyone or base their imitations on a second network, not necessarily overlapping with the interaction graph (see, e.g., [29]).
Our rewiring approaches were strictly selfish. Individuals looked only at their gain when deciding to rewire, either trying not to lose fitness (Conservative) or trying to improve their fitness (Greedy). Other approaches could be explored. It is reasonable to consider populations with a mixture of rewiring strategies. Also, instead of maintaining the number of links constant throughout the simulations, we could assume a continuous creation of new links leading  (n = 1000), 500 of which were AI, having all types equally represented. In a) the network is scale-free and static (Ω = 0), and there exists a cost for becoming AI (P = 1). The network stabilizes with ≈ 71% H whereas the remaining ≈ 29% are Selfish AI. In b) the network has Conservative rewiring, Ω = 0.99, and P = 1. The H population steadily decreases, ending up as only ≈ 1% of the population. All AI types rise in number, despite a sharp initial drop on the number of HConscious AI and a slight drop of Selfish AI. In the end of the simulation, all the AI types have roughly the same presence in numbers, with 26% NashEQ, 22% Selfish, 25% Util, and 24% HConcious. In c), the network has Greedy rewiring, Ω = 0.99, and P = 1. Despite some differences on the initial evolution of the population, the resulting equilibrium of the population is very similar to the one using Conservative rewiring (Fig 2b). The final percentages are 25% NashEQ, 22% Selfish, 27% Util, 25% HConscious and 0.5% H. Table 2: Fitness distribution at the end of the simulations for each type of individual. We use as baseline the average fitness of individuals on a fully H population (100% H). The optimal equilibrium of a fully Util A.I. population (100% Util) is never achieved on our simulations, but is relevant as a means of comparison. All the simulations lead to a society with a high inequality. On a static scale-free network (Ω = 0, P = 1), wealth is hoarded by the Selfish AI population, whereas in both rewiring simulations (Ω = 0.99, Conservative and Ω = 0.99, Greedy) wealth is hoarded by a collusion of Util AI individuals. For Ω = 0.99, Greedy, all individuals but the Util AI ones have their average fitness as 0 because they are not connected to anyone.
100% H Ω = 0, P = 1 Ω = 0, P = −∞ Ω = 0. to a time-evolution of the average degree and other network properties. This would also allow for the reintegration of ostracized individuals, a feature absent from our model.