Published October 13, 2020 | Version 1.0
Dataset Open

GitHub Java Corpus - Function Identifiers

  • 1. Université de Montréal

Description

This dataset contains function identifiers extracted from the GitHub Java Corpus (http://groups.inf.ed.ac.uk/cup/javaGithub/).

Each line corresponds to a method declaration. A line contains the name of the method declaration followed by the function identifiers (i.e., function calls) contained within the method body. 

The file embeddings_train.json can be used to train a word/sentence embedding model using the code in the Github repository (link below).

The corpus was used for the experiments in the paper Combining Code Embedding with Static Analysis for Function-Call Completion.

Github repository to replicate the experiments: https://github.com/mweyssow/cse-saner

Files

embeddings_train.json

Files (2.7 GB)

Name Size Download all
md5:5ef4e84f8c7d06a476f640ea400d1616
1.7 GB Preview Download
md5:d32baa4d9c5a9ed2ba36ddd7f66c2751
1.0 GB Preview Download