Aplib : Tactical Agents for Testing Computer Games (cid:63)

. Modern interactive software, such as computer games, employ complex user interfaces. Although these user interfaces make the games attractive and powerful, unfortunately they also make them extremely difﬁcult to test. Not only do we have to deal with their functional complexity, but also the ﬁne grained inter-activity of their user interface blows up their interaction space, so that traditional automated testing techniques have trouble handling it. An agent-based testing approach offers an alternative solution: agents’ goal driven planning, adaptivity, and reasoning ability can provide an extra edge towards effective navigation in complex interaction space. This paper presents aplib , a Java library for programming intelligent test agents, featuring novel tactical programming as an abstract way to exert control over agents’ underlying reasoning-based behavior. This type of control is suitable for programming testing tasks. Aplib is implemented in such a way to provide the ﬂuency of a Domain Speciﬁc Language (DSL). Its embedded DSL approach also means that aplib programmers will get all the advantages that Java programmers get: rich language features and a whole array of development tools.


Introduction
With the advances of technologies, computer games have become increasingly more interactive and complex. Modern computer games improve realism and user experience by allowing users to have fine grained control/interactions. A downside of this development is that it becomes increasingly difficult to test computer games. For example, to test that a computer game would maintain the correctness invariant of a certain family of states, the tester will first need to operate the game to bring it to at least one of such states. This often requires a long series of fine grained interactions with the game. Only then the tester can check if the said invariant does hold in that state. Such a test is hard, error-prone, and fragile to automate. Consequently, many game developers still resort to expensive manual play testing. Considering that the game industry is worth over 100 This work is funded by EU ICT-2018-3 H2020 Programme, grant nr. 856716 billions USD, speeding up testing by effectively automating manual testing tasks is a need that cannot be ignored.
As indicated above, a common manual expensive test related task is to bring the game under test to a certain state of interest (goal state), either because we want to check if the state is correct, or because we need to do a specific action on this state that is required for the given test scenario. In principle this task is a search problem, for which solutions exist. However, in the context of computer games the problem is challenging. A game often employs randomness and it often consists of many entities that interact with each other and with the user. Some interactions might be cooperative while others can be adversarial. These and other factors lead to a vast and fine grained interaction space which is hard to deal with for the existing automated testing techniques such as search based [27,19], model based [14,41], or symbolic [4,40]. The key to handle such a space, we believe, is to have an approach that enables the programming of domain reasoning to express which parts of the interaction space of a particular game are relevant to consider, and likewise what kinds of plans (for reaching given goal states) are needed. This allows the underlying test engine to focus its search on the parts of the interaction and plan spaces that semantically matter. We propose to base such a solution on a multi-agent approach since autonomous distributed planning and reasoning based interactions with environments are already first class features.
Contribution. This paper presents aplib 6 , a Java library for programming intelligent agents suitable for carrying out complex testing tasks. They can be used in conjunction with Java testing frameworks such as JUnit, e.g. to collect and manage test verdicts. Figure 1 shows a 3D game we use as a pilot where aplib was used to automate testing (we will also use it later as a running example). Aplib features BDI (Belief-Desire-Intention [22]) agents and adds a novel layer of tactical programming that provides an abstract way to exert control on agents behavior. Declarative reasoning rules express when actions are allowed to execute. Although in theory just using reasoning is enough to find a solution (a plan that would solve the given goal state) if given infinite time, such an approach is not likely to be performant enough. For testing, this matters as no devel-opers would want to wait for hours for their test to complete. The tactical layer allows developers to program an imperative control structure over the underlying reasoningbased behavior, allowing them to have greater control over the search process. So-called tactics can be defined to enable agents to strategically choose and prioritize their short term actions and plans, whereas longer term strategies are expressed as so-called goal structures, specifying how a goal can be realized by chosing, prioritizing, sequencing, or repeating a set of subgoals.
While the concept of a hierarchical goal is not new, e.g. it can be solved by Hierarchical Task Networks (HTN) and Behavior Trees (BT), or can be encoded directly as BDI reasoning rules [9], aplib allows it to be expressed in terms of imperative programming idioms such as SEQ and REPEAT, which are more intuitive for programming control. The underlying reasoning based behavior remains declarative. Our tactical programming approach is more similar to tactical programming in interactive theorem proving, used by proof engineers to script proof search [38,11,21]. The use of this style in BDI agents and for solving testing problems is as far as we know new.
As opposed to dedicated agent programming languages [35,39] aplib offers a Domain Specific Language (DSL) embedded in Java. This means that aplib programmers will program in Java, but they will get a set of APIs that give the fluent appearance of a DSL. In principle, having a native programming language for writing tests is a huge benefit, but only if the language is rich enough and has enough tool and community support. Otherwise it is a risk that most companies will be unwilling to take. On the other hand, using an embedded DSL means that the programmers have direct access to all the benefit the host language, in this case Java: its expressiveness (OO, λ-expression etc.), static typing, rich libraries, and wealth of development tools.
Paper structure. Section 2 first introduces the concept of testing tasks; these are the tasks that we want to automate. Section 3 explains the basic concepts of aplib agents and shows examples of how to create an agent with aplib and how to write some simple actions. Section 4 introduces the concept of goal structures, to express complex test scenarios, and our basic constructs for tactical programming. The section also explains aplib's 'deliberation cycle', which necessarily deviates from BDI's standard due to its tactical programming. Large scale case studies are still future work. However, Section 5 will briefly discuss our experience so far. Section 6 discusses related work, and finally Section 7 concludes and mentions some future work.

Testing Task
This section will introduce what we mean by a 'testing task', and what 'automating' it means. The typical testing task that we will consider has the form: where φ is a state predicate characterizing a situation and ψ is a state predicate that is expected to hold on all instances of the situation φ (that is, on all states satisfying φ). We call ψ an invariant, which is the term used by Ernst et al. [15] to refer to a predicate that is expected to hold at a certain control location in a program, e.g. when a program enters its loop, or when it exits; φ would then be a predicate that characterizes the control location of interest. This concept generalizes the well known pre-and postconditions. E.g. if φ captures the exit of a method m, the invariant ψ then describes m's post-condition.
Since game testing typically has to be done in the so-called blackbox setup [3] where we abstract away from the source code (because it would otherwise be too complex), and hence also away from concepts such as programs' control location, we further generalize Ernst et al. by allowing φ to describe a family of game states that are semantically meaningful for human users; we call this a situation. For example φ could characterize the situation where a certain interactable game element, e.g. a switch, is visible, and ψ could then express the expectation that the switch should be in its 'off' state.
Since φ can potentially describe a very large, even infinite, set, the specification φ ⇒ ψ is tested by sampling a finite number of states, and then checking whether the invariant ψ holds in these states. Obviously such tests are only relevant when applied on sample states that satisfy the situation φ. Getting the game into a relevant state for testing φ ⇒ ψ is a non-trivial task for a computer. Since a game typically starts in specific initial states, it first needs to be played to move it to any specific other state. Consequently, when we want to automate the testing of φ ⇒ ψ, the hard part is typically not in checking its invariant part, but in finding relevant states to test the implication.
Playing a game can be seen as the execution of a sequence of actions, e.g. moving up or down, interacting with some in-game entity, etc. The set of available actions might be different on different states. We will call a sequence of actions a plan. A solution is a plan that, when executed, would drive the game under test to a state relevant for φ ⇒ ψ. In manual testing, a human is employed to search for such a solution. There are tools that can be used to record a script that can execute the plan and replay it whenever we need to re-test the corresponding situation. A major challenge, however, with scriptbased test automation is the manual effort required for maintaining the scripts when they break [2]. If the game designers introduce even a small change in a the game layout (e.g. an in-game door is moved to a different position), which happen very often during the development, a recorded script would typically break. Moreover, games are non-deterministic due to all sorts of random behavior (e.g. random moves by computer controlled enemies, or randomness due to timing effect). This makes such automation scripts for games even more fragile.
By 'automated testing' of φ ⇒ ψ we mean to replace the human effort by letting an agent to search for solutions. This is a search problem: the space of possible plans is searched to find at least one that would solve φ. We can define the robustness of an automated test as how well it can cope with the non-determinism of the system under test. Since agents are typically reactive to the environment, agent-based test automation can thus be expected to be robust; this will be discussed later in Section 4.2.
Testing tasks can be generalized to test 'scenarios': Each φ i is a state predicate describing a situation. The sequence φ 0 ; ... ; φ k−1 describes a scenario where executions of the game under test passes through the states satisfying each φ i in the same chronological order as the sequence. In the state where φ k−1 is satisfied, the invariant ψ is expected to hold. For example, if developers employ UML Use Cases, these can be converted to the above form: each flow in a use case can be translated to a scenario, and its post condition to ψ. Testing a scenario is not fundamentally harder than testing a situation, since the next situation φ i+1 in the scenario defines the same kind of search problem as we had in situation testing where φ i describes the starting states for the search.

Aplib Agency
This section will introduce our agent programming framework aplib and show how to use it to automate testing tasks. Preliminary: Java functions. Since Java 8, functions can be conveniently formulated using so-called λ-expressions. E.g. the Java expression: constructs a nameless function that takes one parameter, x, and returns the value of x+1. Unlike in a pure functional language like Haskell, Java functions can be either pure (has no side effect) or impure/effectful. An effectful function of type C→D takes an object u:C and returns some object v:D, and may also alter the state of u.
Importantly, a λ-expression can be passed as a parameter. Since as a function a λexpression defines behavior, passing it as a parameter to a method or object essentially allows us to inject new behavior to the method/object. This allows us to extend the behavior of an agent without having to introduce a subclass. While the latter is the traditional OO way to extend behavior, it would clutter the code base if we plan to create e.g. many variations of the same agent. Our use of λ-expressions to inject behavior is essentially a generalization of the well-known Strategy Design Pattern [20].  Figure 2 illustrates the typical way aplib agents are deployed. As common with software agents, aplib agents are intended to be used in conjunction with an external environment (in our case, this is the game under test) which is assumed to run independently. In aplib, the term 'Environment' refers, however, to the interface between the agents and the game. Aplib agents do not directly access nor control the game. Having the Environment in between keeps aplib neutral with respect to the technology used by the game under test. The Environment is responsible for providing relevant information about the game state for the agents. Each agent would typically control an in-game entity (usually a player character). It controls the entity by sending commands to the Environment, which the latter then translates to in-game actions by the entity.

Agent, Belief, and Goal
Multiple agents can be deployed if the game is multi-player. In such a setup, agents may want to work together. A group of agents that wish to collaborate can register to a 'communication node' (see Fig. 2). This enables them to send messages to each other (singlecast, broadcast, or role-based multicast).
BDI with goal structure. As typical in BDI (Belief-Desire-Intent) agency, an aplib agent has a concept of belief, desire, and intent. An agent's state reflects its belief. It contains information on the current state of the game under test. Such information is a 'belief' because it may not be entirely factual. E.g. the game may only be willing to pass current information of in-game entities in the close vicinity of the agent. So, the agent's information on far away entities might over time become obsolete. The agent can be given a goal structure, defining its desire. Unlike flat goal-based structures used e.g. in 2APL [9] and GOAL [23], in this paper we employ a richly structured goal structure, with different nodes expressing different ways a goal could be achieved through its subgoals; more on this will be discussed Section 4. Abstractly, an aplib agent is a tuple: where s is an object representing A's state and E is its environment.
Π is a goal structure, e.g. it can be a set of goals that have to be achieved sequentially. Each goal is a pair, let's denote it with g T * , where g is the goal itself and T is a 'tactic' intended to achieve g. In BDI terms, T reflects intention. When the agent decides to work on a goal g T * , it will commit to it: it will apply T repeatedly over multiple execution cycles until g is achieved, or the agent has used up its 'budget' for g.
The β in the tuple represents the agent's computing budget. Budget is used to control how long the agent should persist on pursuing its current goal. Executing a tactic consumes some budget. So, this is only possible if β>0. Consequently, a goal will automatically fail when β reaches 0. Budget plays an important role when dealing with a goal structure with multiple goals as the agent will have to decide how to divide the budget over different goals. This will be discussed later in Section 4.
Example. Figure 3 shows a scene in a game called Lab Recruits 7 . Imagine that we want to test that the door (white circled) works (it can be opened). Two buttons (red circled) are present in the room. In a correct implementation, the door can be opened by activating the button closest to the door. A player (yellow circled) can activate a button by moving close to it and interact with it. Suppose the door is identified by door 1 and its corresponding button button 1 . The testing task above can be specified as follows: Fig. 3. A setup where we have to test that a closet door (circled white) can be opened. Figure 4 shows how we create a test agent named Smith to perform the aforementioned testing task. First, lines 1-3 show the relevant part of the environment the agent will use to interface with the Lab Recruits game; it shows the primitive commands available to the agent. The method interact(i, j) will cause an in-game character with id i (this would be the character controlled by the agent) to interact with another in-game entity with id j (e.g. a button). The method also returns a new 'Observation', containing information on the new state of game-entities in the visible range of i. The method moveToward(i, p, q) will cause the character i to move towards a position q, given that p is i's current position. Simply teleporting to q is not allowed in most games. Instead, the method will only move i some small distance towards q (so, it may take multiple update cycles for a to actually reach q). The method also returns a new observation.
Line 5 creates an empty agent. Lines 11-13 configures it: line 11 attaches a fresh state to the agent; then, assuming labrecruitsEnv is an instance of LabRecruitsEnv (defined in lines 1-3), line 12 hooks this environment to the agent. Line 13 assigns a goal named Π to the agent. The goal is defined in lines 6-10, stating that the desired situation the agent should establish is one where the in-game button 1 is active (line 7). Line 9 associates a tactic named activateButton 1 Tac to this goal, which the agent will use to achieve the latter. Line 10 lifts the defined goal to become a goal structure. More precisely, line 6 creates a 'test-goal'. An ordinary goal, created using a constructor named goal rather than testgoal, simply formulates desired states to be in. A test-goal additionally specifies an invariant (line 8). It formulates a testing task as discussed in Section 2. E.g. lines 7 and 8 formulate the testing task in (3). When the goal part is achieved, the invariant will be tested on the current agent state. If this returns true, the test passes, and else it fails. Its automation is provided by the tactic activateButton 1 Tac that should specify some strategy to go towards the button and activate it.

Action (Elementary Tactic)
A tactic is made of 'actions', composed hierarchically to define a goal-achieving strategy. Such composition will be discussed in Section 4.1. In the simple case though, a tactic is made of just a single action. An action is an effectful and guarded function over the agent's state. The example below shows the syntax for defining an action. var Π = t e s t g o a l ( " g " , Smith ) 7 . t o S o l v e ( s → i s A c t i v e ( s . g e t E n t i t y ( " b u t t o n 1 " ) ) ) 8 . i n v a r i a n t ( s → isOpen ( s . g e t E n t i t y ( " door1 " ) ) ) 9 . t a c t i c ( a c t i v a t e B u t t o n 1 Tac ) 10 . l i f t ( ) ; 11 Smith . w i t h S t a t e (new AgentState ( ) ) 12 . w i t h E n v i r o n m e n t ( l a b r e c r u i t s E n v ) 13 . setGoal ( Π ) 14 . budget ( 2 0 0 ) Fig. 4. Creating an agent named Smith to test the Lab Recruits game. The code is in Java, since Aplib is a DSL embedded in Java. The notation x→e in line 7 is Java lambda expression defining a function, in this case a predicate defining the goal.
This statement 8 defines an action with "id" as its id, and binds the action to the Java variable α. The f is a function defining the behavior that will be invoked when the action α is executed. This function is effectful and may change the agent state. The q, is a pure function, called the 'guard' of the action, specifying when the action is eligible for execution. Notice that the pair f , q can be seen as expressing a reasoning rule q → f .
The guard q can be a predicate or a query. More precisely, let Σ be the type of the agent state and R the type of query results. We allow q to be a function of type Σ→R. Whereas a predicate would inspect a state s:Σ and simply return a true or a false, a query inspects s if it contains some object r satisfying a certain property. E.g. q might be checking if s contains a closed door. If such a door can be found, q returns it, else it returns null. This gives more information than just a simple true or false.
More precisely, the action α is executable on a state s if it is both control and guardenabled on s. For now we can ignore control-enabledness. The action is guard-enabled on s when q(s) returns some non-null r. The behavior function f has the type Σ→R→ V for some type V . When the action α is executed on s, it invokes f (s)(r) 9 . The result v = f (s)(r), if it is not null, will then be checked if it achieves the agent's current goal. 1 var approachButton 1 = action ( " approachButton1 " ) . For example, Figure 5 shows an action that can help agent Smith from Fig. 4. In the game Lab Recruits, to interact with a button a player character needs to stand close to the button. Although in Fig. 3 the character seems to stand close to button 1 , it is not close enough. The tactic in Fig. 5, when invoked, will move the character closer to the button (but will not interact with it, yet). It may take several invocations to move the character close enough to the button. The action's guard specifies that the action is only enabled if button 1 exists (line 9) in the agent's belief. and furthermore its distance to the agent is ≥0.01 unit (line 10). The behavior part of the action, line 3, will then move the agent some small distance towards the button. Line 4 will incorporate the returned new observation (of the game state) into the agent's state.
Reasoning. Most of agent reasoning is carried out by actions' guards, since they are the ones that inspect the agent's state to decide which actions are executable. The reader may notice that the guard in the example in Fig. 5 is imperatively formulated, which is to be expected since aplib's host language, Java, is an imperative programming language. However, aplib also has a Prolog backend (using tuprolog [12]) to facilitate a declarative style of state query. Figure 6 shows an example. To use Prolog-style queries, the agent's state needs to extend a class called StateWithProlog. It will then inherit an instance of a tuprolog engine to which we can add facts and inference rules, and then pose queries over these. Imagine a level in Lab Recruits where we have multiple doors and buttons. Some buttons may crank multiple doors when toggled. Suppose a test agent wants to get to a state where two doors, door 1 and door 2 , are open. The example shows the definition of an action named "open doors 1and2 that will do this. Note that after opening one of these doors, the agent should be careful when trying to open the second. It needs to find a button that indeed opens the second door, but without closing the first one again. The reasoning needed to handle this is formulated as a Prolog rule called openDoors defined in lines 3-7. With the help of this rule, the guard for the action "open doors 1and2" can now be formulated as a Prolog query openDoors(B, door 1 , door 2 ), which in aplib is expressed as in line 18. The predicate is true if door 1 is closed and B is a button connected to it (so, toggling the button would crank the door). Else, if door 1 is open, B should be connected to door 2 , but not to door 1 (so, toggling it will not close door 1 again). So, assuming a solution exists, invoking the action above multiple times will first open door 1 , unless it is already open, and then door 2 . Notice that the guard is declarative, as it only characterizes the properties that a right button should have; it does not spell out how we should iterate over all the buttons in the agent's belief to check it.

Structured Goals and Tactics
A goal can be very hard for an agent to achieve/solve directly. For example imagine a level in the game Lab Recruits, similar to Fig. 1, where we have to test some feature F located in some specific room. Let isInteracted F be the goal representing the agent is at F and manages to interact with it (and hence test it). To achieve this the agent will first need to reach the room where F is. To access this room a door D needs to be opened first. The door can be closed, in which case the agent first need to find a specific button B that opens it. If the agent does not know all these steps, then directly solving isInteracted F will be very difficult.
We can help the agent by providing intermediate goals that it needs to solve first. We can formulate this as a 'goal structure' as the one below: where isOpen D and isActivated B are intermediate goals. SEQ and FIRSTof are examples of so-called goal combinators explained below.
In aplib a composite goal is called a goal structure. It is a tree with goals as the leaves, and goal-combinators as nodes. The goals at the leaves are ordinary goals or test-goals, and hence they all have tactics associated to each. The combinators do not have their own tactics. Instead, they are used to provide a high level control on the order or importance of the underlying goals. Available combinators are as follows; let G and G 1 , ..., G n be goal structures: -If g T * is a goal with the tactic T associated to it, g.lift() turns it to a goal structure consisting of the goal as its only leaf. T is implicitly attached to this leaf. -SEQ(G 1 , ..., G n ) is a goal structure that is achieved by achieving all the subgoals G 1 , ..., G n , and in that order. This is useful when G n is hard to achieve; so G 1 , ..., G n−1 act as helpful intermediate goals to guide the agent. Goal structures of this form also naturally express test scenarios as in (2). -H = FIRSTof(G 1 , ..., G n ) is a goal structure. When given H to achieve, the agent will first try to achieve G 1 . If this fails, it tries G 2 , and so on until there is one goal G i that is achieved. If none is achieved, H is considered as failed. -H = REPEAT G is a goal structure. When given H to achieve, the agent will pursue G. If after sometime G fails, e.g. because it runs out of budget, it will be tried again. Fresh budget will be allocated for G, taken from what remains of the agent's total budget. This is iterated until G is achieved, or until H's budget runs out.
Dynamic Subgoals. Rather than providing a whole goal structure to an agent, sometimes it might be better to let the agent dynamically introduce or cancel subgoals. For example imagine an agent A which initially is given a goal structure Π = SEQ(isOpen D , inRoom R ). As the agent works on the first subgoal, isOpen D imagine that it discovers that the door D is closed, and hence the subgoal cannot be reached before another subgoal is solved (i.e. activate the button that opens the door).
Rather than pre-programming how to handle this in Π we can let the tactic of isOpen D to make this decision instead. Since a tactic has access to the agent's state, it can inspect this state. Based on what it discovers it may then decide to insert a new subgoal, let's call it isActivated B , that will cause the agent to first find the button B and activate it in order to open D. The agent can do this by invoking addBefore(isActivated B ), that will then change Π to: The REPEAT construct will cause the agent to move back to isActivated B upon failing isOpen D . The sequence SEQ(isActivated B , isOpen D ) will then be repeatedly attempted until it succeeds. The number of attempts can be controlled by assigning budget to the REPEAT construct (budgeting will be discussed below).
Budgeting. Since a goal structure can introduce multiple goals, they will be competing for the agent's attention. By default, aplib agents use the blind commitment policy [29] where an agent will commit to its current goal until it is achieved. However, it is possible to exert finer control on the agent's commitment through a simple but powerful budgeting mechanism.
When the agent was created, we can give it a starting computing budget β 0 (else it is assumed to be ∞). Let Π be the agent's root goal structure. For each sub-structure G in Π  Fig. 7. The tactic for agent Smith in Fig. 4, composed from three other tactics. The tactic "acti-vateButton1" is an action (the code is not shown) to activate button 1 if it is close enough to the agent. Otherise, the action approachButton 1 (defined in Fig. 5) will move the agent towards the button. This only works if the button is visible to the agent (see the action's guard). Else, FIRSTof falls back to the last tactic that will explore the area around the agent, searching for the button.
we can specify G.bmax: the maximum budget G will get. Else, the agent conservatively assumes G.bmax = ∞. By specifying bmax we control how much the agent should commit to a particular goal structure. This limit can be specified at the goal level (the leaves of Π), if the programmer wants to micro-manage the agent's commitment, or higher in the hierarchy of Π to strategically control it.
Once it runs, the agent will only work on a single goal at a time. The goal g it works on is called the current goal. Every ancestor of a current g is also current. For every goal structure G, let β G denote the remaining budget for G. At the beginning, β Π = β 0 . When a goal or goal structure G in Π becomes current, budget is allocated to it as follows. When G becomes current, its parent either becomes current as well, or it is already current (e.g. the root Π is always current). Ancestors H that are already current keeps their β H unchanged. Then, the budget for G is allocated by setting β G to min(G.bmax, β parent(G) ), after we recursively determine β parent(G) . This budgeting scheme is safe: the budget of a goal structure never exceeds that of its parent.
When working on a goal g, any work the agent does will consume some budget, say δ. This will be deducted from β g and from the budget of its ancestors. If β g becomes ≤0, the agent aborts g. It must then find another goal from Π.

Tactic
Rather than using a single action, Aplib provides a more powerful means to achieve a goal, namely tactic. A tactic is a hierarchical composition of actions. Methods used to compose them are also called combinators. Figure 7 shows an example of a tactic, composed with a combinator called FIRSTof. Structurally, a tactic is a tree with actions as leaves and tactic-combinators as nodes. The actions are the ones that do the actual work. Furthermore, recall that the actions also have their own guards, controlling their enabledness. The combinators are used to exert a higher level control over the actions, e.g. sequencing or choosing between them. This higher level control supersedes guardlevel control 10 . The following tactic combinators are provided; let T 1 , ..., T n be tactics: 1. If α is an action, T = α.lift() is a tactic. Executing this tactic on an agent state s means executing α on s, which is only possible if α is enabled on s (if its guard results a non-null value when queried on s).
2. T = SEQ(T 1 , ..., T n ) is a tactic. When invoked, T will execute the whole sequence T 1 , ..., T n . 3. T = ANYof(T 1 , ..., T n ) is a tactic that randomly chooses one of enabled T i 's and executes it. A SEQ tactic is enabled if its first sub-tactic is enabled. For other combinators, it is enabled if one of its sub-tactic is enabled. 4. T = FIRSTof(T 1 , .., T n ) is a tactic. It is used to express priority over a set of tactics if more than one of them could be enabled. When invoked, T will invoke the first enabled T i from the sequence T 1 , .., T n .
Consider a goal g T * . When this goal becomes current, recall that the agent will then repeatedly execute T until g is achieved (or until its budget is exhausted). Aplib agents execute their tactics in cycles. In BDI agency these are called deliberation cycles [30,10,36]: in each cycle, an agent senses its environment, reasons which action to do, and then performs this action. To make itself responsive to changes in the environment, an agent only executes one action per cycle. So, if the environment's state changes at the next cycle, a different action can be chosen to respond to the change. However, if T contains a sub-tactic T of the form SEQ(T 1 , .., T n ) things become more complicated. If T is selected, the agent has to execute the whole sequence 11 which will take least n cycles, before it can repeat the whole T again. This makes the execution flow of a tactic non-trivial. We therefore have to deviate from the standard BDI deliberation [36].
Aplib deliberation cycle. Imagine an agent A = (s, E, Π, β). At the start, A inspects its goal structure Π to determine which goal g T * in Π it should pursue, and calculates how much of the budget β should be allocated for achieving g (β g ). A will then repeatedly apply T over multiple cycles until g is achieved, or β g is exhausted. At every cycle, A does the following: 1. Sensing. The agent asks the Environment to provide a fresh state information. 2. Reasoning. The agent determines which actions α in T are executable on the current state s. This is the case if α is guard-enabled on s and furthermore also controlenabled. The definition of latter is somewhat complicated. Let us explain it with an example instead. Suppose T = ANYof(α 0 , SEQ(α 1 , α 2 ), α 3 ). The first time T is considered for execution, α 0 , α 1 and α 3 becomes control-enabled, but not α 2 . If α 0 turns out to be not guard-enabled, and α 1 , α 3 are, only the latter two are executable. Suppose α 1 is chosen for execution. At the next cycle only α 2 is control-enabled.
If it is also guard-enabled it can be executed, else it remains control-enabled for the next cycle. After α 2 is executed, the execution of the whole T is completed, and it can be repeated again. If no action is executable, the agent will sleep until the next cycle. Note that since the game under test runs autonomously, it may in the mean time move to a new state, and hence in the next cycle some actions may become enabled. 3. Execution and resolution. Let α be the selected action. It is then executed. If its result v is non-null, it is considered as a candidate solution to be checked against the current goal g. If v achieves g (so, g is solved), the agent inspects the remaining goals in Π to decide the next one to handle. The whole cycle is repeated, but with the new goal. If there is no goal left, then the agent is done. If g is not achieved, it is maintained and the whole cycle is repeated.

Test Robustness
Let us now explain more concretely why aplib test automation is more robust. Recall the tactic activateButton 1 Tac (Fig. 7) to activate button 1 . Notice that it uses the tactic approachButton 1 .lift() (defined in Fig. 5) to approach the button first in case the agent is not standing next to it. Notice that the location is not hard-wired in this tactic, but instead queried from the button itself. Let us also replace the call to moveTowards in line 3 in Fig. 5 with navigateTo. This will cause the agent to use aplib's 3D-space path finding to guide itself towards the given location. If the game designer now moves the button elsewhere, e.g. to swap its position with the far button in Fig. 3, the tactic will still work, as long as there is a path that reaches the button. The tactic approachButton 1 requires however that the button is already in the agent's belief, which would not be the case if the developer moves it to a new position that is initially not visible to the agent. Fortunately the enclosing tactic activateButton 1 Tac can deal with that, by falling back to the 'explore' tactic to search the button first.
If the level contains some random fire hazard, we can replace approachButton 1 in activateButton 1 Tac with a more adaptive variant e.g.: FIRSTof(avoidHazardTac, approachButton 1 .lift()) If the agent now detects fire when it on its way to button 1 , it will first try to evade the fire before resuming its navigation to button 1 . Importantly, since the tactic executability is re-checked at every deliberation cycle, the agent will be able to timely invoke the above re-planning.

Proof of Concept
As a proof of concept, we tried aplib on the Lab Recruits game 12 . The game is developed by a group of students using an established game development framework called Unity 3D. It consists of about 5300 lines of C# code, 8100 lines of meta-files, and various other assets. The game allows a 'level' to be defined, consisting of rooms, in one or multiple floors, populated with in-game objects, such as tables, and chairs. Some of them are interactable, such as doors and buttons. Some of them represent hazard, such as, fire. When a level is created, or modified, a typical testing task is to verify that certain rooms in the level are reachable from the player character initial state. Access to rooms may be guarded by doors, which in turn can only be opened (or closed) by activating specific in-game buttons. Activating/deactivating a button requires the player character to stands next to the button. We were able to use an aplib agent to automate such tasks, which otherwise would involve manual work which can be substantial is the level is large and the tasks have to be repeated, e.g. for retest.
To enable the use of aplib agents, the game developers do have to implement an instance of the Environment (see again the architecture in Fig. 2). The implementation of this interface is game-specific. For Lab Recruits it takes about 1000 lines. Additionally, we found that it is convenient to build a library of game-specific common tactics, e.g. for Lab Recruits tactics to explore nearest unseen area or to navigate to a certain interactable. For Lab Recruits these take about 300 lines of code. The effort is indeed substantial, but it is a one-off investment, after which it can be used over and over, to test any Lab Recruits level regardless its complexity and size.

Related Work
Software agents have been employed in various domains, e.g. computer games, health care, and control systems [25,28,26]. With aplib we have another usecase, namely automated testing. Using agents for software testing has actually been attempted before [34,31,5,33]. However, these works use agents to test services or web applications, which are software types that can already be handled by non-agent techniques such as model based [41] or search based [19,1] testing, whereas we argued that high interactivity of computer games poses a different level of challenge for automated testing.
To program agents, without having to do everything from scratch, we can either use an agent 'framework', which essentially provides a library, or we use a dedicated agent programming language. Examples of agent frameworks are JADE [6] and aplib for Java, HLogo [7] for Haskell, and PROFETA [17] for Python. Examples of dedicated agent languages are JASON [8], 2APL [9], GOAL [23], JADEL [24], and SARL [37]. HLogo is an agent framework that is specialized for developing an agent-based simulation. On the other hand, JADE and aplib are generic agent frameworks that can be connected to any environment. Aplib is light weight compared to JADE. E.g. the latter supports distributed agents and FIPA compliance which aplib does not have. JADE does not natively offers BDI agency, though BDI agency, e.g. as offered by JADEL, can be implemented on top of JADE. In contrast, aplib and PROFETA are natively BDI agent frameworks.
Among the dedicated agent programming languages, some are dedicated for programming BDI agents. The good thing is that they offer Prolog-style declarative programming. On the down side e.g. available data types are restricted (e.g. no support for collection and polymorphism), which is a serious hinderance if we are to use them for large projects. One with a very rich set of language features (collection, polymorphism, OO, lambda expression) is SARL, though it is non-BDI. PROFETA and aplib are somewhere in between. Both are BDI DSLs, but they are embedded DSLs rather than a native language as SARL. Their host languages are full of features (Python and Java, respectively), that would give the strength of SARL that agent languages like JASON and GOAL cannot offer.
Aplib also offers the fluency of an embedded Domain Specific Language (DSL). It makes heavy use of design patterns such as Fluent Interface [18] and Strategy Pattern [20] to improve its fluency.
Aplib's distinguishing feature compared to other implementations of BDI agency (e.g. JACK, JASON, 2APL, GOAL, JADEL, PROFETA) is its tactical programming of plans (through tactics) and goals (through goal structures). An agent is essentially set of actions. The BDI architecture does not traditionally impose a rigid control structure on these actions, hence allowing agents to react adaptively to changing environment. However, there are also goals that require certain actions to be carried out in a certain order over multiple deliberation cycles. Or, when given a hard goal to achieve, the agent might need to try different strategies, each would need to be given enough commitment by the agent, and conversely it should be possible to abort it so that another strategy can be tried. All these imply that tactics and strategies require some form of control structures, although not as rigid as in e.g. procedures. All the aforementioned BDI implementations do not provide control structures beyond intra-action control structures. This shortcoming was already observed by [16], stating domains like autonomous vehicles need agents with tactical ability. They went even further, stating that Agent Oriented Software Engineering (AOSE) methodologies in general do not provide a sufficiently rich representation of goal control structures. While inter-actions and inter-goals control structures can be encoded through pushing and popping of beliefs or goals into the agent's state, such an approach would clutter the programs and error prone. An existing solution for tactical programming for agents is to use the Tactics Development extension [16] of the Prometheus agent development methodology [32]. This extension allows tactics to be graphically modelled, and template implementations in JACK can be generated from the models. In contrast, Aplib provides the features directly at the programming level. It provides the additional control structures suitable for tactical programming over the usual rule-based style programming of BDI agents. When programming test agents, having an option to exert control helps the tester to narrow the agents' search space which may benefit their performance, which is important when we start to accumulate a large number of tests.

Conclusion & Future Work
We have presented aplib, a BDI agent programming framework featuring multi agency and novel tactical programming and strategic goal-level programming. We choose to offer aplib as a Domain Specific Language (DSL) embedded in Java, hence making the framework very expressive. Despite the decreased fluency, we believe this embedded DSL approach to be better suited for large scale programming of agents, while avoiding the high expense and long term risk of maintaining a dedicated agent programming language.
With the above features aplib would be a good choice to be used as a framework to program test agents for testing highly interactive software such as computer games. Our experience so far with the Lab Recruits case study (Fig.1) shows that even a simple test agent that can navigate within a closed terrain already introduces automation that is previously not possible. Larger and more thorough case studies are still future work. We would also like to explore the use of emotion modelling framework such as FAtiMA [13] alongside aplib agents to allow us to test user experience (e.g. whether the game becomes too boring too quickly), which is an aspect of a great concern in the game industry.
While in many cases relying on reasoning-based intelligence is enough, there are also cases where this is not. Recently we have seen rapid advances in learning-based AI. As future work we seek to extend aplib to let programmers hook learning algorithms to their agents to teach the agents to make the right choices, at least in some situation, as an alternative when rule-based reasoning becomes too complicated (e.g. when it involves recognizing visual or audio patterns).