Distributed Software Development Modelling and Control Framework

With the rapid progress of internet technology, more and more software projects adopt e-development to facilitate the software development process in a world-wide context. However, distributed software development activity itself is a complex orchestration. It involves many people working together without the barrier of time and space difference. Therefore, how to efficiently monitor and control software edevelopment in a global perspective becomes an important issue for any internet-based software development project. In this paper, we present a novel approach to tackle this crucial issue by means of controlling e-development process, collaborative task progress and communication quality. Meanwhile, we also present our e-development supporting environment prototype: Caribou, to demonstrate the viability of our approach.


INTRODUCTION
During the past decade, more and more software systems are developed towards new technology platform [11][12] [15]. Meanwhile, internet-based collaborative software development becomes one of the most significant practices adopted by many software engineering practitioners [6,8][16] [13]. However, the large scale of collaboration over web lacks of sufficient supporting techniques to efficiently monitor and control web collaboration activities [14] [18]. In this paper, we propose a novel approach to tackle this critical issue. We mainly focus on the modelling and controlling of web-based distributed software development activity. The ultimate goal of this research is to facilitate the co-operative work for web collaboration in e-development environment.  Figure 2: Caribou Workflow Generation Process Figure 2 shows the Caribou workflow generation process. In the communication control module, a quantitative communication quality measurement mechanism is established to accommodate the need of e-development communication quality control. In this module, the performance of communication between developers will be enhanced through the control model and reinforcement model. The major functionality of the task progress control module is to automate and monitor the edevelopment task progress, thus to provide an accurate measurement of project advancement. Supporting tools control module will host the related third-party tools and provide their services to the participants, thus to facilitate the detail e-development activities.
Project metric collection module will automatically collect and analyse a large range of process/project metrics, accommodate the quantitative control of the e-development project.
The challenges of developing e-Development Process Workflow Module in Caribou lie in the following aspects: 1. Design and implement a full fledged distributed workflow management system (WfMS). This embedded workflow server will later supply the automatic e-development process control. The feature of distribution means such a WfMS is internet enabled, thus maximize the flexibility of distributed application practice. 2. Model e-development activities using workflow definition formalism to specify their tasks and procedural constraints. Furthermore, it will support dynamical modification and adjustment of e-development process. Figure 3 illustrates the e-development workflow modelling capability in Caribou.

COLLABORATIVE COMMUNICATION MODELLING AND CONTROL
In this section, we present the e-Development communication control module in Caribou. We first analyse and model human communication in an e-development environment. This model will later help us to monitor and control participants' communication activities within Caribou. Meanwhile, with the help of task progress control module, user will later be able to obtain a comprehensive view of the whole collaboration process by means of precise perceiving the development progress.
As we know, human is the primary factor in e-development. Collaboration among participants has two major forms: one is concrete working together to accomplish a task, the other is discussing with each other to solve some difficult problems.  [24]. Participants' cooperation efficiency largely affects the progress and quality of whole e-development project.

Networked Collaborative Software e-Development Communication Model
Meeting, discussion and pair programming etc. are various kinds of collaboration forms. The fundamental media that conveys those exchanges of information is communication. In a traditional environment, such activity is easy to obtain with oral language. While in a networked e-development environment, this type of information exchange is not that convenient to acquire. In most cases, lacking of an efficient way to monitor and measure communication activities remains one of the major obstacles of networked collaboration [ in providing the demanded information required by others, how can we adopt more efficient actions to largely avoid such situation?
Furthermore, if we have an urgent question and don't know who is responsible or is potentially able to answer it, the question becomes: who should we discuss with? Meanwhile, after we have published such urgent questions on e-bulletin board, and haven't gotten satisfied answer, what should we do next? If the proper person simply doesn't have time to browse discussion board, even though we know that there must be someone who has the answer, yet we still cannot trigger him/her out. This will eventually undermine the collaboration efforts. Therefore, how to effectively convey the concerned messages to proper people, how to secure the information solicitation mechanism and how to accurately measure the quality of collaboration by means of communication in a networked e-development environment are the three major goals for our research prototype.
To realize the ultimate objectives, we first build up our networked collaborative software edevelopment communication model, see  We hereby apply this model to deal with four common forms of communication in a collaborative e-development environment. In such environment, the communication support program will help developers to easily interchange information and discussions [5,6]. In Caribou e-development supporting environment, this type of collaboration will be eventually computerized and recorded to evaluate the quality and progress of the whole project. Corresponding automated control will harmonize the performance with predefined standards. When questioner has a question or solicitation of information, she may invoke a question. Based upon the network personnel matching mechanism (which will be elaborated in the following section), the question will be automatically conveyed to related person(s).
In a typical communication transaction scenario, questioner will fill in a predefined question head form to estimate some key metrics of the inquiry, such as urgency degree, importance degree and difficulty degree, etc. These sorts of information will be used to assist the automated control of the communication process.
A standard evaluation check form helps receiver to evaluate the question. The questioner also evaluates the reply by simply checking the quality tabular. The inquiry transaction may repeat for several times until the problem is solved or dead blocked. All these performance are recorded by e-Development communication monitoring module. The result will be processed by automated control module to trigger the corresponding control actions.
For example, a deadlocked question will be prompted to a higher-level group leader or technical coordinator to deal with; widely concerned questions may be presented to project manager, and request her to provide a general solution or suggestion; an extra delay of a question with high urgency degree will generate a caution message to group leader, etc.

Automation of Cooperative e-Development Communication in Caribou
To well observe human collaborative communication activities inside of software e-development projects, we have built up the networked collaboration communication model which has been presented in previous section. Now we apply this model to automate the monitoring and controlling measurements of personal communication activities in a distributed collaborative edevelopment environment. e-Development Personnel Matching Mechanism: here we provide our solution to the first goal in our prototype, which is to effectively convey concerned messages to proper people. The solution for second goal will be presented in the following part of this section.
As we know, to automate the monitoring of cooperative communication activities, one important issue is to find the proper person to deliver the inquiry. There are mainly four communication forms defined in Caribou, in which a participant may involve. Based on these four forms, we've designed our network personnel matching methods to match the pair(s) of people to have a communication channel. This mechanism hereby implements the automated convection of questions to suitable person(s).

1.Direct Personnel:
In this type of communication, the questioner knows who should be asked for. The matching mechanism will simply use the pointed stuff name or ID to directly deliver the questioner's requests to those who are expected to answer.

2.Direct Task-individual:
In this communication form, the questioner doesn't know who should be asked for. Whereas he/she may know what tasks are related to the concerned information that he/she is inquiring.
The matching mechanism is to use the related tasks to trace the potential individuals who may have the answer. In a general case, if the task number is x, then all the persons who have participated in task-x may be considered as potential receivers. When there's more than one task that the questioner has marked as related to the concerned question, then the matching mechanism is to select those persons who have participated the most of the tasks. They will be most likely to have a comprehensive view towards the question domain. The system divides people into several groups according to the number of tasks they have participated related to the question. Those who have participated the most tasks will be considered as the most likely possible receivers. As illustrated in figure 5, suppose there are three tasks marked as related jobs that concerned with the question. The matching process will first search the participants of these tasks in project database. Then, it generates several groups to hold personnel according to the task number they have participated in. Here we get three groups namely G1, G2 and G3. They hold personnel that have contributed to three, two and one tasks respectively. Finally, selection strategies may be applied to select potential receivers. As a result, the persons in G1 will be considered as the highest potential question receivers, while those in G3 will be considered to be the lowest potential receivers. The rational behind this strategy is that, if someone has more knowledge about the most of the tasks that the questioner is inquiring about, such person may be more suitable to provide pertinent answers.

Unknown Receptor:
In some cases, the questioner may not be able to know who should be asked for, and even doesn't know which tasks are related with her question. The personnel matching mechanism will transfer this type of questions to e-bulletin board, group leader and technical coordinator etc.

4.Public Informing:
When questioner just wants to provide some useful information/announcement for public, personnel matching module will convey it into public bulletin board.
The above four personnel matching mechanisms for distributed e-development will ensure an solution for each question that has raised during the development process, as illustrated in figure 6.
Electronic copy available at: https://ssrn.com/abstract=3611601 When questioner (most left) publishes a question in a collaborative e-development environment, firstly, the question itself will be recorded in communication database; then, a search engine will query from this database to find out related information concerned with the question. The search criteria, for example, could be the same vocabularies that appeared in former question/answer pairs. Meanwhile, the personnel matching mechanism (center) will build up a channel to proper person(s). The request enforcement mechanism (most right) will ensure the elicitation of responses from those targeted receivers. The answer will also be recorded in communication database for other reference usage. The feedback from database and receivers will be presented to questioner.
The communication interaction may recur for several times, until a problem is solved or deadlocked, which will be triggered to the attention of the manager level personnel. The automated control module also deals with abnormal situations within Caribou communication module.

Request Enforcement: Performance Control
In this section, we'll realize our goal of securing the solicitation of desired information in edevelopment. That means, the answering of a question is no longer an option, or a spontaneous reaction. It is somehow a mandatory request in a typical distributed software development project. Moreover, such performance data will be recorded to quantify the evaluation of the collaborative e-development quality.
To fulfil this objective, we have established a request enforcement mechanism to help questioner squeeze out a high quality answer, as showed in Figure 7. In Caribou e-development supporting environment, there is a set of pre-defined dealing solutions. Each solution contains a collection of action scripts to invoke correspondent actions based on the pass-time length. It could be a remaindering to receiver after a short period of passed time; and if the time length is quite longer than expected response period, a reinforcement action will be taken.
Electronic copy available at: https://ssrn.com/abstract=3611601 For each question, before it has been sent out, questioner has to mark estimated fuzzy values for three metrics, namely urgency degree (less urgent, medium, very urgent), importance degree (unimportant, medium, important) and difficulty degree (easy, medium, difficult). Then, based on these values, the Caribou communication control module will use fuzzy logic to figure out the suitable solution, and trigger the corresponded actions based on the waiting time intervals. These actions are labelled with weight value. Each weight represents the tolerance degree of the action when waiting interval exceeds a certain time-length of threshold.
The solution set (left) includes many solutions to deal with different types of conditions. For each solution, it includes several actions to cope with that condition based on waiting time intervals. Control module uses fuzzy logic and the fuzzy set of three metrics (right) to calculate the number of most suitable solution.
Therefore, with the help of Caribou personnel matching and communication request enforcement mechanisms, we are able to effectively convey the concerned messages to proper people, and secure the information solicitation. The dynamic performance data will late be used to quantitatively measure the quality of collaboration by means of communication in Caribou edevelopment supporting environment.

COLLABORATIVE DEVELOPMENT TASK MONITORING
In this section, we will present the e-Development Task Progress Control Module. Collaboration status monitoring and controlling is one of the most important issues in distributed development project. A dynamic supervision of on-going tasks has to be deployed. It will regulate both collaborative group members and external management, thus to ensure the development project to be in schedule and adopt proper actions in case of schedule slippage. In this section, we discuss our approach in fulfilling this objective. We apply autonomous task agent to facilitate the automation work. The goal is to strengthen the quantitative control of distributed development collaboration and reduce the inconsistency between different practitioners and organizations.

Web Collaboration
Like any kind of engineering practices, distributed software development is a progressive process. There is a time line to distinguish different stages of achievement. Meanwhile, this line also shows the progress curve of advance. It helps practitioners to track the historical performance as well as monitor and control the activities that are presently undergoing. Unfortunately, for a web-Electronic copy available at: https://ssrn.com/abstract=3611601 based collaborative software e-development project, due to the distributive and dynamic nature of web collaboration environment, the time line is not easy to define. Web collaboration has made all the participants in a virtue development venue. All these may require practitioners to have an efficient way to monitor and control their widely dispersed development activities. We try to solve this complex problem by means of well defining the visual collaboration progress model. Furthermore, we apply autonomous agent to collect quantified data to measure the progress metrics in e-development project. In Caribou, we also have defined a set of control measurements corresponding to different progress deviations, and authorize agent to finally automate the actions to control development progress, thus to provide real-time support of edevelopment collaboration.

Collaboration Task Progress Modelling
In a collaborative software development project, the collaboration is consisted of many tasks that involve different individuals. To monitor the whole collaboration, we should have a clear view of each task's progress status. Task generally can be viewed as an elemental work unit as a conceptually whole that is performed by one or several persons. It normally includes concrete detail functions, performance steps and final objective. The major difference between task and function lies in task is a logical definition and participant number can be more than one; while function is a detailed minimum job that can be accomplished by a single person. The challenge lies in that, at any given moment, how can we measure the advance of task's progress? Our approach is to utilize agent to dynamically collect task attribute information, use predefined progress model to determine the progress metrics, and visualize them to reflect each task's progress status. To illustrate our approach, we first analyse task attributes. A collaborative edevelopment task has following two types of attributes: static and dynamic attributes. Static attributes: includes task name/id_code, type, scheduled time, participant (number, id/names), anticipated output functions (name/code, task result description, number), steps (name/ id_code, description, number), scheduled resource (name/ id_code, type, quantity), ultimate objective (description), etc.
We notice that, under certain conditions, some static attributes can also be changed into dynamic attributes. For example, at a given time, an emergency event occurs. The static attributes of participant number may be changed into dynamic attributes. A visual model, shown in figure 8, illustrates the collaborative e-development task progress model in Caribou.
Electronic copy available at: https://ssrn.com/abstract=3611601 It visually represents the progress degree of each task inside of the whole collaborative edevelopment process.
At any given time, it shows the static (right) and dynamic (left) attributes of the observation task (central). In the meantime, it shows the dynamic aspects in terms of advance and progress rate. The advance histogram (central) indicates the advance of this task; and the progress histogram (bottom) shows the abstract progress during a period of time. On the top beside <<task>> stereotype, there are three most important metrics, namely the importance, urgency and health index of the task.
Advance index: indicates the advancement of a given task at any time. It is measured by a group of metrics. They together reflect the advance degree of the task. Here we give the definition of them: Steps Index= (undergoing/scheduled) Extra_Resource_Index=(extra resource request/scheduled) (Resource) Electronic copy available at: https://ssrn.com/abstract=3611601 Progress Index: The activity progress of a given task will be represented by visualizing the difference of "advance index" between the start and end states during a period of time, see figure  9. They work together as a sign of progress to show the total activity increment results. Assume the time interval between two given moment is ∆t=t2-t1 where t1=start time, t2=end time. To measure a certain task, or a group of tasks, we assume the period of observation time ∆t is same. Therefore, we define progress index for each metric as: (at moment t=t(n)) Figure 9. Calculation of Progress Index At given moment time= t(n), when progress index value is negative, there will be a black column above the absolute value to indicate the negative increase, such as progress index of "Importance Metric" showed in our case (right graph (iii), the most left column, showed as yellow). Because at moment t=t(n), the importance metric values is smaller than that of one moment ago (time=t(n-1)). Therefore the progress index value is negative, which shows the decrease of progression.
Health Index: shows in general, the quantified health degree of a task. We use more sophisticated method to get the value. To avoid bias, we use both advance and progress index metrics to compute the assessment of more subjective metrics. Furthermore, these metrics can also be estimated according to cooperative organization's policy strategy. We define "Health Index" metric as: {i=1..N, j=1..M|N=the number of progress index; M= the number of advance index}. P is progress index and A is advance index. WP is the weight for progress index; WA is the weight of advance index. The weight represents the importance of each aspect from the organization's point of view.

Real-time Visual Collaboration Network
After building up models for collaborative e-development task, our next objective is to monitor the current status of distributed software e-development activities in Caribou. One crucial step to realize automatic monitoring is real-time collaboration activity data collection. Caribou edevelopment support system works as a general platform to serve collaboration among institutions.
Health Index= Σ (Pi*WPi) + Σ (Aj*WAj) Electronic copy available at: https://ssrn.com/abstract=3611601 Meanwhile, individual participants use Caribou framework system as a way to either access project resources or collaborate with others without considering the time and space variance, see figure 10. Their performance is monitored and coordinated by Caribou environment. Task agent is used to automatically collect dynamic task attribute data and event information.

DISCUSSION AND FUTURE WORK
In Caribou prototype, we use graphic diagram to visualize the status of collaborative tasks, their relations and the whole project progress historic performance for an e-development project. In distributed collaboration development, task network has two dynamic aspects. One is the continuously generation and expiration of tasks; the other one is the constantly changes inside of existing tasks. At a given time, the whole development may have a certain number and types of tasks that are undergoing. After a period of time, some tasks may have already been finished, while some new ones have been generated to suit for current project progress need. Therefore, our whole collaboration network will regularly change its shape and organization. Each task has its own life-circle and may have many people cooperate within it; various events can affect its progress route. Caribou environment uses the dynamically collected activity data by to generate the visual web collaboration network diagram to facilitate the monitoring and controlling of an active e-development project. Each task also has its own output functionalities, namely the results it generates. For example, a task named "Implementing Class C" may have two functionalities: one is the usable class, and the other one is its usage documents. Such functionalities are the produced results. A task may use the output functionality produced by others. We use line to connect the request task and the output functionality of service task (represented by tiny circle). Two tasks may "cooperate" with each other by providing services to both. For example, task "Testing Component A" and "Testing Component B" may mutually use the functionalities provided by each other. When a task is finished, all the expected functionalities should be produced. To use the result provided by a historical task that does not lie on the same layer of the active tasks, a dotted line will be used to connect two tasks between two layers. The same will also be applied on participant personnel. Collaboration task network visually depicts the dynamic organization and status of tasks/personnel. Several people may work together on a same task; and one person can also participate in more than one tasks. The lines between tasks thus represent the conceptual supporting relationships among them. Based on the monitoring data that has been obtained by task agent, we are able to simulate the past and present course of web-based collaborative software development process simultaneously. Furthermore, we may later be able to use these two types (present and past) of results to analyse the collaborative development performance.
Web-based collaborative software e-development involves many people working together without the barrier of time and space difference. However, the large scale of collaboration in a typical edevelopment project lacks of sufficient supporting techniques to efficiently monitor and control the distributed collaboration activities. In this paper, we have presented our novel approach and the prototype of Caribou: a supporting environment for distributed collaborative development, to tackle this important issue. In addition, we also provide solutions to automate the dynamic control of cooperation, thus to fulfil the goal of optimizing the e-development performance. In the following future work, we aim to build up service models for these collaboration monitoring and control service, thus to provide a general purpose collaboration assistant service over internet.