Published January 27, 2026 | Version v1
Dataset Open

Automated Type-IV Clone Generation via LLMs and Deterministic Validation (replication package)

Authors/Creators

Description

KaminoDatasetGenerator.zip - contains the replication package for the empirical evaluation. Make sure to read the Readme.md inside the root folder for instructions. 

  • The experiment results are in /results
  • The pilot study results are inside /results/pilot study
  • The results of exploring the top 50 configurations are inside /results/using top 50 configs

Prompts  - prompts are inside /pipeline/src/utils/prompts.py

Refactorings.pdf - contains a description, with examples, of the seven refactorings used as part of the prompts

Hyperparameters - these are inside /pipeline/src/config.py

kamino_clones_dataset.zip - the final dataset, split into training and testing

Files

kamino_clones_dataset.zip

Files (962.6 MB)

Name Size Download all
md5:6d893a862fc630e5c7dcdd20b1a52401
2.1 MB Preview Download
md5:7e68f144fe0910c23f40261cdd3e44b1
960.4 MB Preview Download
md5:2b4a7c5ba6e08a24b844743416ffa447
115.8 kB Preview Download