Published May 28, 2021 | Version 1.0.1
Dataset Open

NCQ Dataset (NPM Package Data)

  • 1. The University of Adelaide

Description

Dataset for use in Node Code Query, contains package information in a tab separated csv file. The unzipped size is ~700MB.

You do not need to manually download this file for use in NCQ, the setup scripts will handle this for you automatically.

The dataset contains the following fields:

Mined from the NPM registry:

  • Package name
  • Description
  • Keywords
  • License
  • repositoryUrl
  • timeModified

Derived from data on the NPM registry:

  • Array of Node.js code snippets extracted from the package README using https://github.com/Brittany-Reid/npm-code-snippets
  • Number of markdown code blocks in the README (number may be larger than node.js snippets, these are non-filtered)
  • Number of lines in the README
  • If an install example exists in the README (if a code block exists with `npm install` or a `install` header exists)
  • If a run example exists in the README (if a code block exists with `npm run` or a usage header exists)

Mined from GitHub for packages with a GitHub repository (values will be 0 or false for packages missing this data)

  • Number of stars
  • Is a fork?
  • Number of forks
  • Number of watchers
  • If a test directory exists (if the top level directory contains a folder called `test` or `tests`)

This updated version also contains:

  • The confidence value of an 'able to install' prediction.

Files

dataset.zip

Files (197.2 MB)

Name Size Download all
md5:4a8ce38e49841ddc98e4d24ed6eed958
197.2 MB Preview Download