Published December 9, 2023
| Version v1
Computational notebook
Open
Text-to-text Generation for Issue Report Classification
Authors/Creators
- 1. TCS Research
Description
Submission for the NLBSE Issue Report Tool Competition
This package accompanies the submission titled "Text-to-text Generation for Issue Report Classification" to the NLBSE Issue Report Tool Competition. The package provides resources for replicating the experiments and results presented.
Description of ZIP Files:
- issue_classification_t5: This archive contains the code for replicating the study, including the retrieval of the pre-trained model, fine-tuning procedures, and inference execution.
- code: Contains all the code files.
- finetuning.py: The contents of this file comprise the code for fine-tuning the VMware/flan-t5-large-alpaca model on the issue report classification task. Additionally, embedded comments provide guidance on executing the fine-tuning process. Be sure to read the embedded comments.
- inference.py: This file contains the codebase for conducting inference using the fine-tuned model. Similar to the fine-tuning script, instructions for running the inference process are embedded as comments within the file.
- download_plm.py: This file contains the code for downloading VMware/flan-t5-large-alpaca from https://huggingface.co/VMware/flan-t5-large-alpaca .
- requirements.txt: This file enumerates the required Python modules and their respective versions necessary for the successful execution of the provided code.
- data: Folder contains the NLBSE issue report classification data and model output after running inference using inference.py on issue-report-test.csv
- checkpoint-3000-output.csv: The contents of this CSV file present the output obtained after fine-tuning the VMware/flan-t5-large-alpaca model for 2 epochs (F1-score of 0.8297) on issue-report-train.csv and running the inference on issue-report-test.csv. Column 'label' contains the ground truth labels. Column 'Model generated output' contains the predicted label by the model.
- issue-report-train.csv: NLBSE24 isssue report tool competition train dataset. (Source: https://github.com/nlbse2024/issue-report-classification)
- issue-report-test.csv: NLBSE24 isssue report tool competition test dataset. (Source: https://github.com/nlbse2024/issue-report-classification)
- code: Contains all the code files.
- finetuned_model_checkpoint-3000: This zip file contains the fine-tuned model (VMware/flan-t5-large-alpaca) to 2 epochs.
Technical info (English)
Environment details:
- Operating System: Ubuntu 22.04
- NVIDIA Driver Version: 470.141.03
- NVIDIA CUDA Version: 12.2.1
- Python version: 3.10
- GPU Name: Nvidia A100
- GPU Memory: 20 GiB
- CPU Memory: 60 GiB
Note: We also attempted fine-tuning using a V100 GPU, and the results showed slight differences, potentially attributed to variations in GPU architecture. However, running inference on any GPU using the provided model finetuned_model_checkpoint-3000 should yield the same results as reported.
Files
finetuned_model_checkpoint-3000.zip
Files
(8.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:81ec72040c9ceedc4fb1a541c3c48b3e
|
8.5 GB | Preview Download |
|
md5:cfc56d32ec2319fe4b162a234f0b6a7e
|
2.6 MB | Preview Download |