Natural Language-Guided Programming User Study

10.5281/zenodo.5384768 https://zenodo.org/records/5384768 oai:zenodo.org:5384768 Heyman, Geert Geert Heyman 0000-0001-6276-424X Nokia Bell Labs Huysegems, Rafeal Rafeal Huysegems 0000-0001-6244-9864 Nokia Bell Labs Justen, Pascal Pascal Justen Nokia Bell Labs Van Cutsem, Tom Tom Van Cutsem 0000-0003-4116-4290 Nokia Bell Labs Natural Language-Guided Programming User Study Zenodo 2021 code completion code prediction natural language-guided programming example-centric programming 2021-09-02 2021-09-03 10.5281/zenodo.5384767 0.0.1 BSD 3-Clause "New" or "Revised" License In this dataset you find the user study data that was used in the Natural Language-Guided Programming paper, which is accepted for Onward! 2021. A preprint can be found here https://arxiv.org/pdf/2108.05198.pdf. The dataset consists of the following files: benchmark.json contains 201 test cases. Each test case consists of context, a natural language intent and target code. The test cases are intended to evaluate a model that can predict code giving a piece of context code and a natural language intent. The test cases were derived from Jupyter notebooks that were crawled from Github projects with permissive licenses. In the project_metadata field you find information about the original project such as its git url and license. predictions-annotated.json contains predictions of the three models used in the paper for 100 test cases in benchmark.json. Each prediction is accompanied with qualitive assesments from three annotators. train-index.jsonl is the list of github projects that were used for training the models. eval-index.jsonl is a list of github projects that we kept separate for evaluation. The benchmark.json was created from a random subset of the projects in this list. For more details we refer to the paper.