Published August 20, 2024 | Version v1
Dataset Open

Automated Generation of Code Contracts - Generative AI to the Rescue?

  • 1. University of Southern Denmark
  • 2. University of Bern
  • 3. University of Athens
  • 4. feenk GmbH

Description

This replication package provides the setup and results to generate OpenJML code contracts for Java source code by fine-tuning and employing the resulting CodeT5 and CodeT5+ transformer models. Our code contract generation setup involved the training of the AI models and application. Furthermore, we analyzed the generated annotations wrt. thier logical validity and the type of OpenJML compilation errors. Both methods, together with the results are similarly provided.

Source Code Repository (see also scripts-sources.tar): 

Replication Package: contains the following [folders]

  • Scripts:
    • [scripts-sources.tar]: source codes of the following scripts
      • Python scripts that we used for training and adding the OpenJML code contracts to the Java methods
      • automated analyses of the studied source code classes and the type of compilation errors
  • Sourcegraph Search Results:
    • [sourcegraph-results.tar]: the results of the Sourcegraph search queries 
  • Datasets:
    • [dataset.tar]: the dataset including the weka-project which contributes two-thirds of the contracts
    • [dataset-withoutweka.tar]: the dataset without weka, which is significantly smaller and was used to examine the performance bias when training and testing without weka
  • CodeT5 Models:
    • [codet5-contracts.tar]: the best performing CodeT5 model which was fine-tuned to create OpenJML annotations for methods
    • [codet5p-contracts.tar]: the best performing CodeT5+ model which was fine-tuned to create OpenJML annotations for methods
    • [codet5p-contracts-withoutweka.tar]: the CodeT5+ model which was trained without weka on the same task
  • Analysis Results:
    • [analysis-results.tar/compilability-analysis]: the results of the compilability analysis
      • the subjects to which we applied the best performing CodeT5+
      • the compilation results and their analysis
    • [analysis-results.tar/logical-analysis] the results of the logical analysis
      • the analysis of logic validity of SimpleStack and SimpleTicTacToe

Files

Files (8.3 GB)

Name Size Download all
md5:48d0d3d2c29095b27d823e372a98e32a
6.2 MB Download
md5:16c2d4730e09e5f73c7dc4f1a40219cf
2.7 GB Download
md5:c1a9800da8556007c09c0be84ae9dfc3
2.7 GB Download
md5:0c5ab85660313b0eaf5e79e807057735
2.7 GB Download
md5:6eb3e5911691f68b4c5d56a7af77c661
55.9 MB Download
md5:3a84d69eab3f27c54aff72165f625670
71.8 MB Download
md5:48f6c1733d575339bee50dc466ade1d2
65.5 kB Download
md5:45cf5ccbbb5fbb0ea0e29ff0294952de
120.5 MB Download

Additional details