Published 2025 | Version 1.0
Dataset Open

CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories

  • 1. ROR icon University of Waterloo

Description

Modern programming languages are constantly evolving, introducing new language features and APIs to enhance software development practices. Software developers often face the tedious task of upgrading their codebase to new programming language versions. Recently, large language models (LLMs) have demonstrated potential in automating various code generation and editing tasks, suggesting their applicability in automating code upgrade. However, there exists no benchmark for evaluating the code upgrade ability of LLMs, as distilling code changes related to programming language evolution from real-world software repositories’ commit histories is a complex challenge.
In this work, we introduce CoUpJava, the first large-scale dataset for code upgrade, focusing on the code changes related to the evolution of Java. CoUpJava comprises 10,697 code upgrade samples, distilled from the commit histories of 1,379 open-source Java repositories and covering Java versions 7–23. The dataset is divided into two subsets: CoUpJava-Fine, which captures fine-grained method-level refactorings towards new language features; and CoUpJava-Coarse, which includes coarse-grained repository-level changes encompassing new language features, standard library APIs, and build configurations. Our proposed dataset provides high-quality samples by filtering irrelevant and noisy changes and verifying the compilability of upgraded code. Moreover, CoUpJava reveals diversity in code upgrade scenarios, ranging from small, fine-grained refactorings to large-scale repository modifications.

Files

README-zenodo.md

Files (4.2 GB)

Name Size Download all
md5:b93b7270d15af48d0d9e7abc436636d7
637.7 MB Download
md5:1c8bed6e908d1ab4e44550be5f5634b4
29.5 MB Download
md5:007c579cf8f6419e3cd6f39830f0e193
3.5 GB Download
md5:86d3f3a95c324c9479bd8986968f4327
11.4 kB Download
md5:912539b5603be8e36821ee05a41aef72
2.4 kB Preview Download

Additional details

Software