Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published June 20, 2021 | Version v1
Dataset Open

What are the characteristics of highly-used packages? A case study on the npm ecosystem

  • 1. Concordia University
  • 2. Queen's University

Description

With the popularity of software ecosystems, the number of open source components (a.k.a. “packages”) has been growing rapidly. Identifying high-quality and well-maintained packages from a large pool of packages to depend on is a basic and important problem, as it is beneficial for various applications, such as package recommendation, package search, etc. However, there is no systematic and comprehensive work so far that focuses on addressing this problem except in online discussions or in informal literature and interviews. To fill this gap, in this paper, we conduct a mixed qualitative and quantitative analysis to understand how developers identify and select relevant open source packages. In particular, we start by surveying 118 JavaScript developers from the npm ecosystem to qualitatively understand the factors that make a package to be highly-used within the ecosystem. The survey results show that JavaScript developers believe that highly-used packages are well-documented, receive a high number of stars on GitHub, have a large number of downloads, and do not suffer from vulnerabilities. Then, we conduct an experiment to quantitatively validate the developers' perception of the factors that make a highly-used package. In this analysis, we collect and mine historical data from 2,427 packages divided into highly-used and low-used packages. For each package in the dataset, we collect quantitative data to present the factors studied in the developers' survey. Next, we use regression analysis to quantitatively explain which of the studied factors are the most important. Our regression analysis support developers' believe about highly-used packages. In particular, the results show that highly-used packages tend to be impacted by the number of downloads, stars, and how larger is readme file of the package.

Files

badges_for_sampled_packages.csv

Files (228.7 MB)

Name Size Download all
md5:098f85b2cc1e273961012103737f4f1e
1.0 MB Preview Download
md5:365279954dfbb8544480cfadec6cf4a0
1.2 MB Preview Download
md5:23086512337dab553d550d45d6d42517
211.5 MB Preview Download
md5:ddfb6bc9b86373b61bd217a93d82e2a3
77.8 kB Preview Download
md5:c5d59ca6562ea1bebf2e3a5f5ce37aa3
35.7 kB Preview Download
md5:b8df70653bbcda008147a5e9ac567332
111.4 kB Preview Download
md5:4868ce30643600652098c7291f4c7199
111.7 kB Preview Download
md5:d901ba84c3694f60eefafdc9ddeb8459
821.2 kB Preview Download
md5:2d45c552e9fa637186912dd551ce375e
1.4 MB Preview Download
md5:b167ae79384ae66e615dec0003a018e8
354.0 kB Preview Download
md5:4f15e11e9c6aca6b8b90cf7dd3413f04
982.8 kB Preview Download
md5:20f322b172aa691941ac94ef76a01b02
331.9 kB Preview Download
md5:0a20a46e72ae36a18e62eb1b4a60a680
488.9 kB Preview Download
md5:1c18a54424ccf6169aafbae91e83043a
10.2 MB Preview Download
md5:0b64bacd6863b8aa34d4a97add68a04b
32.0 kB Preview Download
md5:bb7da9c961ccfacbac7e51f4f219f1e6
1.8 kB Preview Download
md5:32c62fccf24b57a27402678f7c31c2d8
29.4 kB Preview Download