Dataset Open Access
Le, Kim Tuyen; Rashidi, Gabriel; Andrzejak, Artur
<?xml version='1.0' encoding='utf-8'?> <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:creator>Le, Kim Tuyen</dc:creator> <dc:creator>Rashidi, Gabriel</dc:creator> <dc:creator>Andrzejak, Artur</dc:creator> <dc:date>2021-11-28</dc:date> <dc:description>Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches. We published the CT3-enhanced dataset with pre-computed token types for each token in the Python150k dataset. The dataset was obtained from an empirical study of the below paper: Kim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In Special Issue on Programming Language Processing, Data Mining and Knowledge Discovery. Please read the README.txt file for detailed information of structuring the enhanced dataset.</dc:description> <dc:identifier>https://zenodo.org/record/5733013</dc:identifier> <dc:identifier>10.5281/zenodo.5733013</dc:identifier> <dc:identifier>oai:zenodo.org:5733013</dc:identifier> <dc:relation>doi:10.5281/zenodo.5148585</dc:relation> <dc:rights>info:eu-repo/semantics/openAccess</dc:rights> <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights> <dc:subject>code completion</dc:subject> <dc:subject>accuracy evaluation</dc:subject> <dc:subject>code token types</dc:subject> <dc:title>A Code Token Type Taxonomy-enhanced dataset with pre-computed token types for Python150k</dc:title> <dc:type>info:eu-repo/semantics/other</dc:type> <dc:type>dataset</dc:type> </oai_dc:dc>
All versions | This version | |
---|---|---|
Views | 50 | 30 |
Downloads | 2 | 2 |
Data volume | 2.1 GB | 2.1 GB |
Unique views | 37 | 23 |
Unique downloads | 2 | 2 |