Published February 27, 2022
| Version v2
Software documentation
Open
On the Importance of Building High-quality Training Datasets for Neural Code Search
Creators
- 1. Monash University
- 2. Tongji University
Description
In our paper "On the Importance of Building High-quality Training Datasets for Neural Code Search", we propose a data cleaning framework for the training datasets of neural code search models. The implementation of this framework is a third-party Python library, Natural Language Query Filter (NLQF). NLQF is designed to filter out the noisy data given a set of comment-code pairs based on the syntax and semantic of the comments. We would like to apply for the Reusable badge and Available badge for NLQF. The evaluation of this artifact requires Python programming skills and a GPU-enabled evironment.
Files
ICSE-Artifact-Final.zip
Files
(33.6 MB)
Name | Size | Download all |
---|---|---|
md5:ebe6b9f1facfb65fde2e45b3ad21dcd7
|
1.3 MB | Preview Download |
md5:f00bdd9e7472cfdc76849bcc0db0cc65
|
32.3 MB | Preview Download |