There is a newer version of the record available.

Published September 22, 2020 | Version v0.1
Dataset Open

ManyTypes4Py: A benchmark Python Dataset for Machine Learning-Based Type Inference

  • 1. Delft University of Technology

Description

  • Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset is gathered on Sep. 17th 2020.
  • The dataset has more 5.4K Python repositories that are hosted on GitHub.
  • It contains more than 1.1M type annotations.
  • Please note that this is the first version of the dataset. In the second version, we will provide processed Python projects in JSON files that contain relevant features and hints for ML-based type inference task.

Files

Files (473.1 kB)

Name Size Download all
md5:b5f0f0ec5570bd36a414f9f639d1ec2b
473.1 kB Download

Additional details

Funding

FASTEN – Fine-Grained Analysis of Software Ecosystems as Networks 825328
European Commission