A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FOR BUILDING DURATION ESTIMATION MODELS: CASE STUDY OF GITHUB
Authors/Creators
- 1. University of Maroua, Cameroon 2LaRI Lab, University of Maroua, Cameroon
Description
Software project estimation is important for allocating resources and planning a reasonable work schedule. Estimation models are typically built using data from completed projects. While organizations have their historical data repositories, it is difficult to obtain their collaboration due to privacy and competitive concerns. To overcome the issue of public access to private data repositories this study proposes an algorithm to extract sufficient data from the GitHub repository for building duration estimation models. More specifically, this study extracts and analyses historical data on WordPress projects to estimate OSS project duration using commits as an independent variable as well as an improved classification of contributors based on the number of active days for each contributor within a release period. The results indicate that duration estimation models using data from OSS repositories perform well and partially solves the problem of lack of data encountered in empirical research in software engineering.
Files
11620ijsea03.pdf
Files
(897.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2cd1e2133caabe6b886f95b15b88965f
|
897.5 kB | Preview Download |