Published May 31, 2018 | Version v1
Dataset Open

Zenodo Machine Learning

Description

The Zenodo-ML dataset is a collection of just under 10K records from the Zenodo service for generation of digital object identifiers (DOIs) for software and associated digital resources. In human terms, this means that someone writes a codebase for their software, and links it to Zenodo so others can find and cite it. For this dataset, it means that we can find these codebases, and do the following:

  • convert each script file into a set of 80x80 images, with characters converted to ordinal, for use with machine learning
  • generate a file hierarchy tree for graph analysis
  • extract complete metadata like domain, authors, and description for the software

Files

Files (20.7 GB)

Name Size Download all
md5:bff9f8ca3632fa7372f0b9e440b85c5a
20.7 GB Download