Python Annotated Code Search (PACS) Datasets & Pretrained Models
Description
This upload contains datasets and pre-trained models used for the paper Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent. The code for easily loading these datasets and models will be made available here: http://github.com/nokia/codesearch
Datasets
There are three types of datasets:
- snippet collections (code snippets + natural language descriptions): so-ds-feb20, staqc-py-cleaned, conala-curated
- code search evaluation data (queries linked to relevant snippets of one of the snippet collections): so-ds-feb20-{valid|test}, staqc-py-raw-{valid|test}, conala-curated-0.5-test
- training data (datasets used to train code retrieval models): so-duplicates-pacs-train, so-python-question-titles-feb20
The staqc-py-cleaned snippet collection, and the conala-curated datasets were derived from existing corpora:
- staqc-py-cleaned was derived from the Python StaQC snippet collection. See https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset, LICENSE.
- conala-curated was derived from the conala corpus. See https://conala-corpus.github.io/ , LICENSE
The other datasets were mined directly from a recent Stack Overflow dump (https://archive.org/details/stackexchange, LICENSE).
Pre-trained models
Each model can embed queries and (annotated) code snippets in the same space. The models are released under a BSD 3-Clause License.
- ncs-embedder-so-ds-feb20
- ncs-embedder-staqc-py
- tnbow-embedder-so-ds-feb20
- use-embedder-pacs
- ensemble-embedder-pacs
Files
Files
(3.0 GB)
Name | Size | Download all |
---|---|---|
md5:f9a23e7c58bcbf4c896b890a98501cf6
|
135.8 kB | Download |
md5:0d82a1b5713b9e0f963cc28edd951c73
|
20.1 kB | Download |
md5:01d87a5d124d4cc3ae894d388324c851
|
392 Bytes | Download |
md5:d3e03f182aefc6567abf5bb24a882763
|
758.2 MB | Download |
md5:18d95bf20a8a86fd47c8afbad4df68af
|
836.6 MB | Download |
md5:8f3edaffa04c8c4cadfb797297377ae6
|
20.0 MB | Download |
md5:db5d54fb1b7a3cb73f790a8c471b86df
|
49.6 kB | Download |
md5:3cd13b2a9017bd802c873155c5b31ac9
|
42.0 kB | Download |
md5:19be23d9c56c40ba66e9c7deaf9e9b29
|
1.8 MB | Download |
md5:618a9d44d709c7b406e79df2168ae34a
|
10.6 MB | Download |
md5:613a54309f4822d663a9fb97e27065f3
|
27.3 MB | Download |
md5:bd5f77beb928b37f4b0168a3e944d2b6
|
30.7 MB | Download |
md5:9dc608fa6990f005377241fbb4ffa9f6
|
105.5 kB | Download |
md5:06ad578a6c2f5806feded4940c2fdd3f
|
85.1 kB | Download |
md5:e52211856e268be0a0f97942951e3b1b
|
764.8 MB | Download |
md5:7620b07cb1385465dff49efefc162b2b
|
549.9 MB | Download |