There is a newer version of the record available.

Published May 28, 2021 | Version 1.0.0
Dataset Open

NCQ Dataset (NPM Package Data)

  • 1. The University of Adelaide

Description

Dataset for use in Node Code Query, contains package information in a tab separated csv file. The unzipped size is ~700MB.

You do not need to manually download this file for use in NCQ, the setup scripts will handle this for you automatically.

The dataset contains the following fields:

Mined from the NPM registry:

  • Package name
  • Description
  • Keywords
  • License
  • repositoryUrl
  • timeModified

Derived from data on the NPM registry:

  • Array of Node.js code snippets extracted from the package README using https://github.com/Brittany-Reid/npm-code-snippets
  • Number of markdown code blocks in the README (number may be larger than node.js snippets, these are non-filtered)
  • Number of lines in the README
  • If an install example exists in the README (if a code block exists with `npm install` or a `install` header exists)
  • If a run example exists in the README (if a code block exists with `npm run` or a usage header exists)

Mined from GitHub for packages with a GitHub repository (values will be 0 or false for packages missing this data)

  • Number of stars
  • Is a fork?
  • Number of forks
  • Number of watchers
  • If a test directory exists (if the top level directory contains a folder called `test` or `tests`)

Files

dataset.zip

Files (195.8 MB)

Name Size Download all
md5:e072688ea9fb4d13c0f546c2d0ec214f
195.8 MB Preview Download