Dataset Open Access
Le, Kim Tuyen; Rashidi, Gabriel; Andrzejak, Artur
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="DOI">10.5281/zenodo.5733013</identifier> <creators> <creator> <creatorName>Le, Kim Tuyen</creatorName> <givenName>Kim Tuyen</givenName> <familyName>Le</familyName> <affiliation>Heidelberg University</affiliation> </creator> <creator> <creatorName>Rashidi, Gabriel</creatorName> <givenName>Gabriel</givenName> <familyName>Rashidi</familyName> <affiliation>Heidelberg University</affiliation> </creator> <creator> <creatorName>Andrzejak, Artur</creatorName> <givenName>Artur</givenName> <familyName>Andrzejak</familyName> <affiliation>Heidelberg University</affiliation> </creator> </creators> <titles> <title>A Code Token Type Taxonomy-enhanced dataset with pre-computed token types for Python150k</title> </titles> <publisher>Zenodo</publisher> <publicationYear>2021</publicationYear> <subjects> <subject>code completion</subject> <subject>accuracy evaluation</subject> <subject>code token types</subject> </subjects> <dates> <date dateType="Issued">2021-11-28</date> </dates> <resourceType resourceTypeGeneral="Dataset"/> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/5733013</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.5148585</relatedIdentifier> </relatedIdentifiers> <rightsList> <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><p>Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches.</p> <p>We published the&nbsp;CT3-enhanced dataset with pre-computed token types for each token in the <a href="https://www.sri.inf.ethz.ch/py150">Python150k dataset</a>.</p> <p>The dataset was obtained from&nbsp;an empirical study of the below paper:</p> <p>Kim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In <em>Special Issue on Programming Language Processing, Data Mining and Knowledge Discovery</em>.</p> <p>Please read the README.txt file for detailed information of structuring the enhanced dataset.</p></description> </descriptions> </resource>
All versions | This version | |
---|---|---|
Views | 50 | 30 |
Downloads | 2 | 2 |
Data volume | 2.1 GB | 2.1 GB |
Unique views | 37 | 23 |
Unique downloads | 2 | 2 |