Published September 2, 2021 | Version 0.0.1
Dataset Open

Natural Language-Guided Programming User Study


In this dataset you find the user study data that was used in the Natural Language-Guided Programming paper, which is accepted for Onward! 2021. A preprint can be found here The dataset consists of the following files:

  • benchmark.json contains 201 test cases. Each test case consists of context, a natural language intent and target code. The test cases are intended to evaluate a model that can predict code giving a piece of context code and a natural language intent. The test cases were derived from Jupyter notebooks that were crawled from Github projects with permissive licenses. In the project_metadata field you find information about the original project such as its git url and license.

  • predictions-annotated.json contains predictions of the three models used in the paper for 100 test cases in benchmark.json. Each prediction is accompanied with qualitive assesments from three annotators.

  • train-index.jsonl is the list of github projects that were used for training the models.

  • eval-index.jsonl is a list of github projects that we kept separate for evaluation. The benchmark.json was created from a random subset of the projects in this list.

For more details we refer to the paper.



Files (16.9 MB)

Name Size Download all
4.2 MB Preview Download
632.6 kB Download
6.4 MB Preview Download
5.7 MB Download