Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Method

,


Introduction
The cost estimated of software is one of the major challenges in the management of software project development. The accuracy of software cost estimation is very important for use as a reference by the project manager for managing activities related to Software Development Life Cycle (SDLC). The main tasks of software project managers are ensuring that the project is achieved with the aim of "high-quality software must be produced at a low cost of concern in time and budget". In addition, good managers of a software project can appropriately forecast the cost and others resources of the project which they handle. Before getting project costs, it is usually determined the effort required to complete the software development project. An effort is expressed in a person-month. Activity estimates of costs and efforts are intended to obtain an accurate estimation result, not the result of either overestimate or underestimate. It can also manage the application's quick configuration [1]. The accuracy is determined by several variables or cost drivers of a method that is used, so to get an accurate cost estimate of the software requires control of the variables or cost drivers that affect it.
Estimating software costs in the early phases can increase control of project activities that is planning, budgeting, and monitoring. An appropriate estimate can prepare well to espouse for the process of decision-making, as well as all processes in a project of software development can be safeguarded efficiently and effectively, because for a project generally has limited resources. The main difficult problems for estimating software costs are both inherent uncertainty in software development and the complex and dynamic interplay of factors influencing the software development effort used. There are both several techniques and procedures for dealing with the problem. Both methods are algorithmic and non-algorithmic which can be used to forecast software cost [2,3]. The technique of algorithmic refers to the existence of a mathematical equation to estimate software costs, whereas non-algorithmic techniques refer to the absence of mathematical equations to estimate software costs [3]. This includes the analogy method [4], artificial neural network [5,6], fuzzy logic [7,8], genetic algorithm [9], and Cuckoo optimalization [10]. The cost estimated is expressed in terms of currency ($). Before getting the cost estimated of software development projects, it is usually preceded by estimating effort which is expressed by the amount of person-time to finish a project. Generally, more effort is used, more cost that is spent.
Several methods of cost estimation in recent decades have been introduced and proposed by many researchers. Among all the software cost estimation methods, the COCOMO is the most well-known and widely applied in calculating software costs [11]. Problems come on regarding the exactitude of the implementation of the method to complete the software cost estimated. Techniques of heuristic are used to resolve the limitations of this method and refine its application [12]. A variety of methods in heuristic optimization are used in optimization issues. This method can be applied for software cost estimation as well. These methods are Particle Swarm Optimization [13][14][15][16][17], Genetic Algorithm [9,18], Firefly Algorithm [19], and others.
This paper provides the study of PSO as an algorithm of optimization. It is used to optimize the parameters in the model of COCOMO II, so resulting in more accurate and precise efforts, cost, and time of development. The remaining sections of this paper are composed in the following manner: Section 2 briefly describes the related work already investigated to estimate the effort through different methods and with the PSO approach. Section 3 describes a working methodology consisting of the steps used in this experiment. Section 4 presents the results already obtained and analyzes the discussion of the results. Section 5, the final part concludes this research that the accuracy of the estimates can be refined with the adoption of the PSO approach.

Related Work
The calibration of COCOMO II coefficients to optimize and improve the accuracy of estimation results has been proposed by some previous research. Sarno and Sidabutar [5] investigated the role of software sizes stated in the Line of Code (LOC) and Effort Multiplier (EM) to increase the accuracy of estimate result. Fuzzy Logic using Gaussian Membership Function (GMF) is implemented to COCOMO II for EM. The GMF can make a smoother transition that it means more accuracy of effort multipliers. Also, they implemented the Neural Network (NN) as an approach applied a feed-forward neural network of multi-layer using a backpropagation learning algorithm [6]. The model offered provides significantly more improvement than the basis fuzzy model or the original COCOMO model. For local calibration using models referred to the Fuzzy Logic and Tabu Search approach has been proposed by Baiquni, et al. [20]. They refine grade of precision by obscuring cost drivers in the COCOMO II using Gaussian Membership Functions (GMF) of Fuzzy Logic for redesigning EM. To find new parameter values in COCOMO II model is used local calibration apply Calico and Tabu Search. The new value found can significantly refine precision or degrade errors.
Optimizing the coefficient of COCOMO II model with dataset of NASA using PSO techniques has been proposed by Parkas and Kamabir [15]. In their study, it was found that optimization problems could be solved efficiently and uncertainties could be reduced better using PSO than using original coefficient values. Likewise, Kumar et.al [13] and Sheta et.al [19] have analyzed for PSO optimization along with both Linear regression and Fuzzy Logic by composing a collection of linear models to degrade errors of cost estimation. The NASA18 data set is used on COCOMO II models in their research.
The PSO provides an efficient technique for optimizing estimation of effort, while the method of linear regression gives good results but will take time. Reddy et.al [14] initiated the significant generalization and introduced new models by adding PSO using Constriction Factor for tuning parameters of COCOMO II. This new model can handle uncertain and improper inputs and it can improve the reliability of cost estimated. Experiments conducted by the researchers showed that PSO with a tightening factor gave a satisfactory result. Reddy et al. [14] offered Multi-Objective Particle Swarm Optimization (MOPSO) as a new model for the cost estimated of software. According to their study that the model has given better results when it is compared with the original COCOMO II.

Cost Constructive Model (COCOMO) II
Several models of software cost estimation have been offered and promoted to assist in providing accurate forecasts to assist managers of a project in making correct decisions about their projects [5]. One of the most famous and widely used models of effort estimated is the Constructive Cost Model (COCOMO) which was first introduced by Barry Boehm in 1981 [21]. As an estimate of effort, schedule, cost of planning the process of software development activities used COCOMO as a model. This model was constructed from 63 items of data in a software project dataset in which each data item consist of sixteen variables (cost drivers). Cost Drivers in COCOMO is categorized into three aspects such as Line of Code (LOC), Scale Factors (SF), and Effort Multiplier (EM). All cost drivers generated effort in person-month (PM). COCOMO II was introduced by Barry Boehm in 2000 as a model which has been supplied more accurately with some aspects of improvement in some cost drivers.
The COCOMO II includes several software attributes such as 17 Effort Multipliers (EM), 5 Scale Factors (SF), Software Size (in KLOC), and the effort estimated which are used in the COCOMO II Architecture Post Model. Multiplier attempts are grouped into four categories and there are 5 Factor Scales (SF).
In the model of COCOMO II [4], the equations which are used to calculate software development efforts are shown in equation (1). Where, A is a multiplication constant, has a value of 2.94 that measures effort according to a particular project condition. Size is defined as the size estimated of software in Kilo Source Lines of Code (KSLOC). The E is scale expansion for effort. It is the factor of exponential that has the account record for the relative scale of economies or diseconomies deal with correcting for the size of software projects increasing, and EM i is the Efforts Multiplier in which i=1, 2, 3,4 ....17. Computing the Scale Factor, the coefficient of E is determined by the equation (2): where B is a constant of exponential holding a value of 0.91 and SF j is a Scale Factor where j=1, 2, 3, 4, or 5. This paper attempts by optimizing the two parameters in the model of COCOMO II, the constant of multiplier A and the constant of exponential B.

Particle Swarm Optimization (PSO)
PSO referred on swarm behavior in nature, such as schools of fish and birds called swarm intelligence. PSO was introduced and developed by Kennedy and Eberhart in 1995 [22] and has become one of the most widely used intelligence-based algorithms due to its simplicity and flexibility. Instead of using mutations or crossovers or pheromones, it applies randomness and communication in global among swarm particles [23].
The algorithm of PSO seeks the area of the objective function by updating the path of each agent, named the particle, such the connection path formed by the quasi-stochastic position vector [6,23]. The clumped particles movement consists of two main components: the stochastic component and the deterministic component. Each particle is attracted to the best global position right now g* from its best location x i * in its history, while at the same time the tendency to move randomly. When a particle gets or finds a better location than the previously discovered location, it then updates the location as the best current for particle i. There is the best current for all particles n at any time during the iteration process. The purpose of this process is to get the best global solution among all the best solutions right now until the goal is no longer upgraded or after a certain iteration. Movement of the particle is schematically shown in Figure 1, where ( ) is the best of the current for particles i, and * ( )+ for (i = 1, 2, ..., n) i is the best global current at t. 2211 where x ij is vector positions and v ij is vector velocities for particle i. Then, the new vector of velocity is updated using equation (3). Meanwhile, the initial location of all the particles must be uniformly distributed so that the particles can get samples in most places. The initial velocity of a particle can be given a value of zero, which is, . Foward, the new location may be regenerated with equation (4).
(4) Figure 1. The particle motion schematic representation in the PSO moves towards the best global g* and the best current x i for every particle i Where is the recent search position, is an updated search position, is the current velocity, is the updated velocity, is the best experience of a particle, is the best in the world, w is a weighted function, both r 1 and r 2 are two vectors at random, and each entry gets values between 0 and 1. The parameters c 1 and c 2 are acceleration constants or learning parameters, which can usually be taken as c1 ≈ c2 ≈ 2. In the technique of swarm optimization, looking for solutions in solution space in the range [-x, x]. Although v i can have any value, it is usually limited in some range [0, v max ].

Research Method
In this paper, the COCOMO II model parameters are optimized using Particle Swarm Optimization. PSO is a good technique to solve the uncertainty of the data set and optimize the value that is relevant to the effort and relevant to show results with less time. PSO will take the minimum time and no need to predict the value. Minimum fitness is the initial value to start optimizing value, these values have two types, Pbest and Gbest. A collection of bunch iterations until the best show requires a fitness score. Each particle tries to modify and move its current position and speed according to the distance between the current position and Pbest, and the distance between the current position and Gbest. The inputs are software size, actual effort, EM, and SF, while the outputs are the values of parameters both A and B for local calibration value. The steps of the proposed Particle Swarm Optimization are: a. Initialize the particle "n" with Pi's random position P i and vector of velocity V i of the optimization parameter. it also needs a speed range between [Vmin, Vmax]. The starting position of each individual particle is best (Pbest) for each Particle. b. Initialize the value of weight function w with 1 parameter of weight and coefficient of personal acceleration c1, the social acceleration coefficient c2 with both standards is 2.0. c. for i= 1, 2, 3, ..., n, for all particles and for every particle position by optimization parameter values, a function of fitness evaluation. The fitness function is the Mean Magnitude of Relative Error (MMRE) in equation (9) equation (10) and Manhattan distance (MD) in equation (11). The goal is minimizing both MMRE and MD by selecting the suitable best value from the range stated in step 1. d. Pbest is established for every particle to examine and to contrast the value of effort and effort estimated of the current and previous parameter values. If fitness (p) is better than fitness (Pbest) then set p as Pbest. e. Set the best of Pbest as the best in global (Gbest). The value of particle for which variation between effort and effort estimated is the least selected as a Gbest particle. f. Update velocity and position of the optimization based on equation (3) and (4). The new regulatory formula for parameter A in equation (5) and (6), similarly for parameter B in equation (7) and (8).
The equation for updating the velocity and position of parameter A is given as follows Whereas the equations used to update the velocity and position of parameter B are also given as follows.
g. Give the best value as the optimal solution. h. Repeat steps 3 through 7 up to the amount of user-defined iterations or particle conditions. We promote using the Mean Magnitude of Relative Error (MMRE) and the difference between effort and estimate (Manhattan Distance or MD) like a function used as the offered method.
| | The key to the successful use of estimation methods that predicted results are more accurate than ever. The deviation ratio between actual efforts and efforts estimated should get the smallest value. The high difference between actual effort and effort estimated will have a meaningful impact the costs planning on projects of software development. In this study, we used MRE as common evaluation criteria in cost estimated of software for evaluating the accuracy of the expected effort. In equation (9), it is showed formula to calculate MRE for each observation (each project). The number of measurements of the accuracy level is formulated based on the evaluation criteria ie MRE which state the predictions individually. it can be averaged to produce Mean MRE (MMRE) [3] as stated in equation (10).
Manhattan Distance that calculates a completely different distance between effort and effort estimated The Manhattan distance is considered in equation (11).
The program parameter setting is modulated as shown in Table 1. The value of maximum iterations is ordered to 100, size of the population (or size of swarm) is ordered to 50, the Coefficient of Acceleration is set to 2.0 and 2.0, Coefficient of Inertia is set to 1 and 0.99, and Maximum velocity is set to 100 and the minimum velocity is to -10. The experiment of applying the technique of PSO for optimizing coefficient of COCOMO II model using the dataset of Turkish Software Industry which consists of twelve data instances. Each data instance has of twenty-five attributes consisting of Project ID, five Scale Factor, seventeen Effort Multiplier within the range of value intervals from VeryLow to ExtraHigh. Size of a project is stated in kilo (thousands) of lines of code (KLOC) and Measurable Measures as an actual effort. Details of the dataset are shown in Table 2. All data are applied for calibrating. The results of calibration can be implemented for subsequent projects which they have similar properties.

Results and Discussion
This section shows the results of experiments that have been achieved by applying the method proposed for the dataset. The purpose of this optimization is reducing the uncertainty of the coefficients in the COCOMO II model. Parameters A and B were obtained applying the PSO technique and afterward a comparison of the results obtained with the normal values of the coefficients and the coefficient values of the Tabu Search method [14].
Implementation of this method using Matlab, source code modified from the source code provided by Yang [23]. PSO is applied to the Turkish Software Industry dataset. The offered experiments apply the PSO technique for optimization using equation (1) and (2). The implementation of PSO in updating the velocity and position of the optimization using equations (5) and (6) for the parameter of A and equation (7) and (8) for the parameter of B. The calculated the parameters of both A and B can significantly make accurate of software project estimates. In Figure 2, it is shown that the PSO convergence process after each iteration is performed. Population sizes of 10, 20, 30 and 40 are explored to see process performance. We found that PSO convergence in all experiments with the same minimum error (eg MMRE equals 34.1939).   Table 3. While in Figure 3, the effort comparing for COCOMO II with the present in the graph shows that the calculated effort using PSO is much smoother and closer to the actual effort when it is compared with the effort predicted by simple values of coefficients and Tabu Search.  Figure 3. Graph of effort for the actual effort, the effort estimated using COCOMO II, the effort estimated using Tabu Search, and effort estimated using PSO The function of fitness in equation (9), (10) and (11) is used for the calculation of experimental accuracy. The values are much better in terms of MMRE and MD in an attempt to estimate the actual effort. In Project Number of 2, 8, and 11, actual efforts are 2, 5, and 1 with the method offered. We can obtain accurate values as 2.0001; 5.0012; and 0.9789. This means that methods capable of minimizing errors are significantly down to 0.00%, 0.02%, and 2.11% to the actual effort.
As shown in Table 3 Table 4 or Figure 4. The results of MMRE and MD indicate that the effort estimated by the method proposed gives better results than the COCOMO II and Tabu search.

Conclusion
An accurate effort and cost estimation of software projects have been a challenge both for the software industries and the academic communities. A more accurate cost estimated of software projects can control more and more effective and efficient development resources. There are many models of software estimation which can be applied to estimate software development costs. In this paper, it was studied the efficiency of PSO implementation as an environmentally inspired technique of optimization algorithm to increase the precision of COCOMO II. The proposed PSO method is used to optimize the parameters using the dataset of Turkish Software Industry as test data.
The proposed method was assessed using the evaluation criteria. If it used MMRE as evaluation criteria, then the results showed that the PSO model could reduce to 698.9461% compared with regular COCOMO II model. The PSO model could reduce to 104.876% compared using the tabu search method. If it used MD as evaluation criteria, then the PSO could reduce to 542.6984% compared with the model of regular COCOMO II and the PSO could too reduce to 47.3320% compared to tabu search model. Therefore, the model of COCOMO II with optimized parameters of the PSO method provides a better estimate than the original COCOMO II model.