There is a newer version of the record available.

Published August 6, 2025 | Version v5

Execution-Feedback Driven Test Generation from SWE Issues

Authors/Creators

Description

We have shared 8 folders containing tests for 2 benchmarks (SWT-Bench Lite and TDD-Bench Verified), 2 models (GPT-4o and Claude-3.7-Sonnet), and 2 approaches (Otter and e-Otter). Each folder contains 10 tests using different prompting techniques (e.g., planner, full, standard) and associated logs. We also share the json files containing the e-otter++ generated tests.

Files

claude_e_otter_plus_swt_lite.json

Files (2.8 GB)

Name Size
md5:f9a6edc1bcfeb80bac940c578b68e787
554.6 kB Preview Download
md5:11ae14b42b838b38c75f754d6d8ac5be
939.6 kB Preview Download
md5:99331605793fab49344635c48b195ad3
289.0 MB Preview Download
md5:006ab1c91b5c3fbad72213c5fc3e3407
287.2 MB Preview Download
md5:e94da042bc591f858d5ef29b87aab110
387.4 MB Preview Download
md5:420bb3d2e38aec5dcbc07e24b89a6a61
412.6 MB Preview Download
md5:477bc387f78178f6bd2a5ea03b4b6589
500.6 kB Preview Download
md5:6a74b96a279fca6aa44daa6bc8a40247
782.4 kB Preview Download
md5:30e3ce3e187c57d31802ee8fd1acf1f6
292.2 MB Preview Download
md5:6237d2dde3b997c72703a1f1ed80e42b
289.1 MB Preview Download
md5:efa67759a49030169e745ee3296f613f
384.9 MB Preview Download
md5:57946c7226907d2af86f02dcd4f84e84
411.1 MB Preview Download