Published December 17, 2020 | Version v1
Journal article Open

A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FOR BUILDING DURATION ESTIMATION MODELS: CASE STUDY OF GITHUB

  • 1. University of Maroua, Cameroon 2LaRI Lab, University of Maroua, Cameroon

Description

Software project estimation is important for allocating resources and planning a reasonable work schedule. Estimation models are typically built using data from completed projects. While organizations have their historical data repositories, it is difficult to obtain their collaboration due to privacy and competitive concerns. To overcome the issue of public access to private data repositories this study proposes an algorithm to extract sufficient data from the GitHub repository for building duration estimation models. More specifically, this study extracts and analyses historical data on WordPress projects to estimate OSS project duration using commits as an independent variable as well as an improved classification of contributors based on the number of active days for each contributor within a release period. The results indicate that duration estimation models using data from OSS repositories perform well and partially solves the problem of lack of data encountered in empirical research in software engineering.

Files

11620ijsea03.pdf

Files (897.5 kB)

Name Size Download all
md5:2cd1e2133caabe6b886f95b15b88965f
897.5 kB Preview Download