Towards Generating the Rationale for Code Changes

doi:10.5281/zenodo.8187207

Published 2024 | Version v1

Conference paper Open

Towards Generating the Rationale for Code Changes

1. University of Salerno
2. William & Mary
3. Università della Svizzera italiana

This is the replication package of the paper "Towards Generating the Rationale for Code Changes".
In this paper, we explain the process we adopt for generating the rationale for code changes.

In the existing literature, the focus of techniques addressing the automated generation of commit messages has predominantly been on describing the changes made in a commit rather than explaining why the changes were necessary. To tackle this challenging issue, our work is dedicated to providing the rationale for code changes. Using a generative model for this purpose is a natural approach; however, it presents the challenge of creating a comprehensive training set that documents the rationale behind code commits.

To address this, we devised a pipeline that automatically generates a dataset, demonstrating its capability to (i) identify commits that document the rationale with approximately 81% precision and (ii) create necessary ⟨ 𝐶_{𝑑𝑖𝑓𝑓} , {𝐶_𝑟} ⟩ pairs, where {𝐶_𝑟} comprises sentences expressing the rationale for the code change ( 𝐶_{𝑑𝑖𝑓𝑓} ), yielding around 75% of meaningful pairs.

Subsequently, we implemented this pipeline in real-world scenarios, generating a dataset for training a CodeT5+ model to perform the task of rationale generation. However, it was observed that CodeT5+ faced challenges in this task, especially when dealing with low-confidence predictions. Despite encouraging results under certain conditions, these (partially) negative outcomes highlight the need for further research and investigation into this area.

This repository contains the datasets and code used, as well the obtained results and manually validated samples:

Rationale Finder - This folder contains:
- Bi-LSTMClassifier.py - the script we used to replicate the work by Tian et al. to identify the commit message containing a rationale for the code changes.
- requirements.txt - the dependencies for getting started with the script.
- resultsBiLSTM.txt - the results obtained in the various configurations.
- sampled messages.csv - the dataset used to train and test the BiLSTM classifier.
Explicit Rationale Filter - This folder contains:
- clf.pkl - the Random Forest classifier we developed to identify the commit messages featuring an explicit rationale.
- explicitWhyExtraction.ipynb - the notebook we used to train and test the various Random Forest classifiers.
- onlyWhyLabeledManualCheck.xlsx - the manually validated dataset containing the commit messages featuring an explicit rationale.
- requirements.txt - the dependencies for getting started with the script
- resultRF.txt - the results we obtained from the various configuration of Random Forest classifier.
Mining Explicit Why Commits - This folder contains:
- Mined Commits - this folder contains the output of whycommit_miner.py on project_500_commits_10_contributors_no_forks.secondrun.csv:
  - bulk0.csv to bulk9.csv - all the mined commit messages.
  - commit_diff_analysis.ipynb - the notebook containing the analysis we performed on the mined commit messages, the generation of the 665 sampled commit messages to validate the pipeline and to create the training set for the Rationale Extractor, as well the 400 sampled commit messages to evaluate the Rationale Extractor.
  - commit1hunk10linechanged.xlsx - the dataset containing the 96,034 instances representing commit messages featuring code changes impacting 1 file, 1 diff hunk, and at most 10 lines of code.
  - commit1hunk10linechanged0.json and commit1hunk10linechanged0.json - the dataset containing 96,034 instances representing commit messages featuring code changes impacting 1 file, 1 diff hunk, and at most 10 lines of code. This is in JSON format and partitioned into 2 files to handle better the extraction of rationale through the Rationale Extractor.
  - commits_diff_0.csv to commits_diff_4.csv - the mined commit messages for which we extract the diff hunk.
  - sampled_dataset665.xlsx - the manually inspected commit messages we used to evaluate the performances of the pipeline we adopt to mine commit messages featuring an explicit rationale.
  - sampled400.xlsx - the dataset we used to evaluate the performance of the Rationale Extractor.
- project_500_commits_10_contributors_no_forks.secondrun.csv - the list of projects from which we extracted commit messages.
- requirements.txt - the dependencies for getting started with the script.
- whycommit_miner.py - the script we used to mine the commit messages featuring an explicit rationale.
Rationale Extractor - This folder contains:
- Manual Labeling - this folder contains the results of the manual extraction of the rationale from commit messages, listed in sampled_dataset665.xlsx, to create the dataset with ⟨ 𝐶_m , {𝐶_𝑟} ⟩ pairs, where 𝐶_m is the commit message and {𝐶_𝑟} the sentences expressing the rationale:
  - Aut1_labels.xlsx - the extracted rationale from the first author.
  - Aut2_labels.xlsx - the extracted rationale from the second author.
  - checkedConflicts.ipynb - the notebook containing the analysis of the conflicts on the manual extraction of rationale.
  - evalLLama.json - data used to evaluate LLaMA performance.
  - evalQA,json - data used to evaluate BERT-based models fine-tuned on Question Answering (QA) task.
  - labelled605.xlsx - extracted rationales from the other authors on which the first 2 author had conflicts on the choice of the sentences indicating the rationale.
  - sample400.json - the dataset we used to evaluate the performance of the LLaMA in JSON format.
  - trainLLama.json - data used to fine-tune LLaMA.
  - trainQA.json - data used to fine-tune QA models.
  - WMGCMdata.xlsx - manually labeled dataset from Tian et al. containing 271 commit messages for which the what and why sentences are extracted.
- dsCleaning.ipynb - the notebook we used to handle the rationale featured in the 96,034 instances in commit1hunk10linechanged.xlsx, extracted through LLaMA in 15-shot setting, saved in extractedRationaleNew0.csv and in extractedRationaleNew1.csv, and added to extractedRationaleProcessed0.csv and to extractedRationaleProcessed1.csv (commit already processed by LLaMA), respectively. In this way, we parallelize 2 LLaMA models running on 2 different GPUs. In addition, in this notebook there is the cleaning phase described in Section 5.1 and the partitioning of the created dataset into train, test and eval.
- example15FewShot.py - the script that we used in LLaMA to test it in 15-shot mode. Replace example.py in LLaMA repository with this file and then follow the commands in Finetuning_LLama_for_Rationale_Extraction.ipynb.
- exampleFINETuning.py - the script that we used in LLaMA to test it in fine-tuning mode. Replace example.py in LLaMA-Adapterv1 repository with this file and then follow the commands in Finetuning_LLama_for_Rationale_Extraction.ipynb.
- extractedRationaleNew0.csv - the new extracted rationales with LLaMA in 15-shot mode running on the first GPU.
- extractedRationaleNew1.csv - the new extracted rationales with LLaMA in 15-shot mode running on the second GPU.
- extractedRationaleProcessed0.csv - the already processed commit messages with LLaMA in 15-shot mode on the first GPU.
- extractedRationaleProcessed1.csv - the already processed commit messages with LLaMA in 15-shot mode on the second GPU.
- Finetuning_LLama_for_Rationale_Extraction.ipynb - the notebook containing the commands we used to test LLaMA in fine-tune and few-shot modes, as well the code to compute the metrics we adopted to measure the performance.
- Finetuning_QA_Roberta.ipynb - the notebook we used to train and test QA BERT-based models.
- Results_LLAMA_FineTuning.txt - the results we obtained with LLaMA with different prompts in fine-tuning mode.
- Results_LLAMA_FEW_SHOT.txt - the results we obtained with LLaMA with different prompts in Few-shot mode.
- test.csv - the dataset we used to test CodeT5+ and get the performances.
- train.csv - the dataset we used to train CodeT5+.
- val.csv - the dataset we used to evaluate CodeT5+ in training phase.
Trained Rationale Generator - This folder contains:
- conflictAnalysis.ipynb - the notebook we used to analyze the conflicts the first and second author have in labeling the semantically equivalent rationales.
- evalRes.txt - the results we obtained by running runEval.py to evaluate CodeT5+ over the different checkpoints.
- read_result.ipynb - the notebook we used to compute the metrics we adopted and to evaluate the general performance of CodeT5+.
- results_test_1.csv to results_test_10.csv - the results we obtained with CodeT5+ adopting different beam sizes (1, 3, 5 and 10).
- runEval.py - the script we used to evaluate each CodeT5+' training checkpoint.
- runInference.py - the script we used to leverage the trained CodeT5+ and get the predictions on the input commit diffs.
- sample363ByConfidence Labelled.xlsx - the manually validated sample to analyze clarity and the semantic equivalence of the generated rationale from code changes by CodeT5+.
- tune_codet5p_seq2seq.py - the script that we used to train CodeT5+. Replace tune_codet5p_seq2seq.py in CodeT5+ repository with this file and then follow the instructions provided.

Files

Towards Generating the Rationale for Code Changes.zip

Files (1.9 GB)

Name	Size	Download all
Towards Generating the Rationale for Code Changes.zip md5:d973ec9696d66f99385a63d9ee42d6f4	1.9 GB	Preview Download

	All versions	This version
Views	54	54
Downloads	8	8
Data volume	18.5 GB	18.5 GB

Towards Generating the Rationale for Code Changes

Creators

Description

Files

Towards Generating the Rationale for Code Changes.zip

Files (1.9 GB)