Published August 20, 2024
| Version v1
Dataset
Open
Automated Generation of Code Contracts - Generative AI to the Rescue?
Creators
- 1. University of Southern Denmark
- 2. University of Bern
- 3. University of Athens
- 4. feenk GmbH
Description
This replication package provides the setup and results to generate OpenJML code contracts for Java source code by fine-tuning and employing the resulting CodeT5 and CodeT5+ transformer models. Our code contract generation setup involved the training of the AI models and application. Furthermore, we analyzed the generated annotations wrt. thier logical validity and the type of OpenJML compilation errors. Both methods, together with the results are similarly provided.
Source Code Repository (see also scripts-sources.tar):
- https://github.com/SEG-UNIBE/auto-generated-code-contracts
- https://zenodo.org/doi/10.5281/zenodo.13356451
Replication Package: contains the following [folders]
- Scripts:
- [scripts-sources.tar]: source codes of the following scripts
- Python scripts that we used for training and adding the OpenJML code contracts to the Java methods
- automated analyses of the studied source code classes and the type of compilation errors
- [scripts-sources.tar]: source codes of the following scripts
- Sourcegraph Search Results:
- [sourcegraph-results.tar]: the results of the Sourcegraph search queries
- Datasets:
- [dataset.tar]: the dataset including the weka-project which contributes two-thirds of the contracts
- [dataset-withoutweka.tar]: the dataset without weka, which is significantly smaller and was used to examine the performance bias when training and testing without weka
- CodeT5 Models:
- [codet5-contracts.tar]: the best performing CodeT5 model which was fine-tuned to create OpenJML annotations for methods
- [codet5p-contracts.tar]: the best performing CodeT5+ model which was fine-tuned to create OpenJML annotations for methods
- [codet5p-contracts-withoutweka.tar]: the CodeT5+ model which was trained without weka on the same task
- Analysis Results:
- [analysis-results.tar/compilability-analysis]: the results of the compilability analysis
- the subjects to which we applied the best performing CodeT5+
- the compilation results and their analysis
- [analysis-results.tar/logical-analysis] the results of the logical analysis
- the analysis of logic validity of SimpleStack and SimpleTicTacToe
- [analysis-results.tar/compilability-analysis]: the results of the compilability analysis
Files
Files
(8.3 GB)
Name | Size | Download all |
---|---|---|
md5:48d0d3d2c29095b27d823e372a98e32a
|
6.2 MB | Download |
md5:16c2d4730e09e5f73c7dc4f1a40219cf
|
2.7 GB | Download |
md5:c1a9800da8556007c09c0be84ae9dfc3
|
2.7 GB | Download |
md5:0c5ab85660313b0eaf5e79e807057735
|
2.7 GB | Download |
md5:6eb3e5911691f68b4c5d56a7af77c661
|
55.9 MB | Download |
md5:3a84d69eab3f27c54aff72165f625670
|
71.8 MB | Download |
md5:48f6c1733d575339bee50dc466ade1d2
|
65.5 kB | Download |
md5:45cf5ccbbb5fbb0ea0e29ff0294952de
|
120.5 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/SEG-UNIBE/auto-generated-code-contracts