Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs?

Cedric Richter; Jan Haltermann; Marie-Christine Jakobs; Felix Pauck; Stefan Schott; Heike Wehrheim

doi:10.5281/zenodo.7078575

Published April 29, 2022 | Version v1.0

Dataset Open

Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs?

1. University of Oldenburg
2. Technical University of Darmstadt
3. Paderborn University

Artifact for "Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs?"

Abstract:

Debugging, that is, identifying and fixing bugs in software, is a central part of software development. Developers are therefore often confronted with the task of deciding whether a given code snippet contains a bug, and if yes, where. Recently, data-driven methods have been employed to learn this task of bug detection, resulting (amongst others) in so called neural bug detectors. Neural bug detectors are trained on millions of buggy and correct code snippets.

Given the “neural learning” procedure, it seems likely that neu- ral bug detectors – on the specific task of finding bugs – have a performance similar to human software developers. For this work, we set out to substantiate or refute such a hypothesis. We report on the results of an empirical study with over 100 software developers, targeting the comparison of humans and neural bug detectors. As detection task, we chose a specific form of bugs (variable misuse bugs) for which neural bug detectors have recently made significant progress. Our study shows that despite the fact that neural bug detectors see millions of such misuse bugs during training, software developers – when conducting bug detection as a majority decision – are slightly better than neural bug detectors on this class of bugs. Altogether, we find a large overlap in the performance, both for classifying code as buggy and for localizing the buggy line in the code. In comparison to developers, one of the two evaluated neural bug detectors, however, raises a higher number of false alarms in our study.

Content: The artifact includes the following components:

Web UI: The developer survey was performed online in the browser of the participants. For this, we created a custom web interface tailored for our study task. We included both the implementation of the frontend (website) and backend implementation (buisness logic and database) in this artifact. Therefore, it is not only possible to replicate our survey with same interface and a new group of participants but it is also possible to extend the interface for future studies.
Neural bug detectors: We evaluate the performance of the developers against two neural bug detectors. In this artifact, we include the bug detectors (implementation + trained models) and the evaluation script used for producing our results. Besides the replication of our bug detector evaluation, the detectors can also be used in future projects for detecting variable misuse bugs in Java methods.
Analysis scripts: After collecting the raw results from the developers and neural bug detectors, we performed several analysis to gain insights how developers and bug detectors compare on the variable misuse task. We include all analysis steps in form of Jupyter notebooks in the artifact. With this, it is possible to reproduce all the figures of our paper.

In addition, we also provide further artifacts that were successfully evaluated at ASE 2022:

ASE 2022 Artifact: 10.5281/zenodo.6958242

Virtual machine: 10.5281/zenodo.6957849

Files

fixmyvars_study.zip

Files (5.7 MB)

Name	Size	Download all
fixmyvars_study.zip md5:e94ddea6f3ae829b90b67d6f543ec308	5.7 MB	Preview Download

	All versions	This version
Views	549	201
Downloads	14	10
Data volume	76.4 MB	57.3 MB

Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs?

Creators

Description

Files

fixmyvars_study.zip

Files (5.7 MB)