{
  "DOI": "10.5281/zenodo.5733013",
  "abstract": "Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches.\n\n\nWe published the\u00a0CT3-enhanced dataset with pre-computed token types for each token in the Python150k dataset.\n\n\nThe dataset was obtained from\u00a0an empirical study of the below paper:\n\n\nKim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In Special Issue on Programming Language Processing, Data Mining and Knowledge Discovery.\n\n\nPlease read the README.txt file for detailed information of structuring the enhanced dataset.",
  "author": [
    {
      "family": "Le",
      "given": "Kim Tuyen"
    },
    {
      "family": "Rashidi",
      "given": "Gabriel"
    },
    {
      "family": "Andrzejak",
      "given": "Artur"
    }
  ],
  "id": "5733013",
  "issued": {
    "date-parts": [
      [
        "2021",
        "11",
        "28"
      ]
    ]
  },
  "publisher": "Zenodo",
  "title": "A Code Token Type Taxonomy-enhanced dataset with pre-computed token types for Python150k",
  "type": "dataset"
}