Published May 26, 2021
| Version v3
Dataset
Open
Datasets & utils for paper USING PRE-TRAINED MODELS TO PARTIALLY AUTOMATE CODE REVIEW ACTIVITIES
Contributors
- 1. Università della Svizzera italiana
Description
Raw and processed datasets & Configurations files for Pre-training and Fine-Tuning T5 models
- Pre-Training dataset Obtained by mining Stack Overflow and CodeSearchNet data.
- Fine-Tuning dataset We will fine-tune our T5 small model on different datasets obtained by mining code review data from Gerrit and GitHub repositories.
- Fine-Tuning dataset v1 (Small) Same dataset used by Tufano et al., abstracted code and raw comments.
- Fine-Tuning dataset v2 (Small) Same dataset used by Tufano et al., not abstracted code and cleaned comments.
- Fine-Tuning dataset (Large) Our new Large dataset
Files
dataset.zip
Files
(2.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:85542554e8b5cf0ae3ababc7a3c8a1d7
|
2.8 GB | Preview Download |