00000nmm##2200000uu#4500 5148586 doi 10.5281/zenodo.5148586 oai:zenodo.org:5148586 Rashidi, Gabriel Heidelberg University Andrzejak, Artur Heidelberg University A Code Token Type Taxonomy-enhanced dataset with pre-computed token types for Python150k Le, Kim Tuyen Heidelberg University info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx code completion accuracy evaluation code token types Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches. We published the CT3-enhanced dataset with pre-computed token types for each token in the <a href="https://www.sri.inf.ethz.ch/py150">Python150k dataset</a>. The dataset was obtained from an empirical study of the below paper: Kim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In KDD Workshop on Programming Language Processing (PLP), August 14-18, 2021 (Virtual). Please read the README.txt file for detailed information of structuring the enhanced dataset. Zenodo 2021-07-30 info:eu-repo/semantics/other 20211128212446.0 1050354220 md5:f0bd6c8b0e13ccf1ef039f46640a6c5a https://zenodo.org/records/5148586/files/CT3-dataset-20210729.zip open 10.5281/zenodo.5148585 isVersionOf doi