Published May 26, 2021 | Version v3
Dataset Open

Datasets & utils for paper USING PRE-TRAINED MODELS TO PARTIALLY AUTOMATE CODE REVIEW ACTIVITIES

Authors/Creators

  • 1. Università della Svizzera italiana
  • 1. Università della Svizzera italiana

Description

Raw and processed datasets & Configurations files for Pre-training and Fine-Tuning T5 models 

  • Pre-Training dataset Obtained by mining Stack Overflow and CodeSearchNet data. 
  • Fine-Tuning dataset We will fine-tune our T5 small model on different datasets obtained by mining code review data from Gerrit and GitHub repositories.
    • Fine-Tuning dataset v1 (Small) Same dataset used by Tufano et al.,  abstracted code and raw comments.
    • Fine-Tuning dataset v2 (Small) Same dataset used by Tufano et al., not abstracted code and cleaned comments.
    • Fine-Tuning dataset (Large) Our new Large dataset 

Files

dataset.zip

Files (2.8 GB)

Name Size Download all
md5:85542554e8b5cf0ae3ababc7a3c8a1d7
2.8 GB Preview Download