Published February 27, 2022 | Version v2
Software documentation Open

On the Importance of Building High-quality Training Datasets for Neural Code Search

  • 1. Monash University
  • 2. Tongji University

Description

In our paper "On the Importance of Building High-quality Training Datasets for Neural Code Search", we propose a data cleaning framework for the training datasets of neural code search models. The implementation of this framework is a third-party Python library, Natural Language Query Filter (NLQF). NLQF is designed to filter out the noisy data given a set of comment-code pairs based on the syntax and semantic of the comments. We would like to apply for the Reusable badge and Available badge for NLQF. The evaluation of this artifact requires Python programming skills and a GPU-enabled evironment.

Files

ICSE-Artifact-Final.zip

Files (33.6 MB)

Name Size Download all
md5:ebe6b9f1facfb65fde2e45b3ad21dcd7
1.3 MB Preview Download
md5:f00bdd9e7472cfdc76849bcc0db0cc65
32.3 MB Preview Download