JavaDL: Automatically Incrementalizing Java Bug Pattern Detection

Alexandru Dura; Christoph Reichenbach; Emma Söderberg

doi:10.5281/zenodo.5090141

Published July 10, 2021 | Version 0.0.0

Software Open

JavaDL: Automatically Incrementalizing Java Bug Pattern Detection

1. Lund University

JavaDL Artifact Introduction

JavaDL is a language and toolchain for analyzing Java at the source level. JavaDL is based on MetaDL, a variant of Datalog that adds syntactic pattern matching, here specialized to pattern matching on Java.

This Artifact Overview guide describes the VM image that we provide for artifact evaluation:

Getting Started
Step-by-step instructions for using JavaDL and for reproducing our results
Claims from the paper and how our artifact supports them

Getting Started Guide

For the following steps, we assume a typical modern machine with a recent multi-core CPU and 64 GiB of RAM. The process may take more than 30 minutes on older machines, but most of this will be automated.

The necessary steps are:

Installing Docker
Installing and Starting Docker Image
Basic Testing
Advanced Testing (optional)

Caveats

Please note that we have only tested the process below on Ubuntu and Debian Linux. The same process may or may not work for OS X or Windows, though we are hopeful that the Docker website provides sufficient guidance to run the "Install Docker" and "Install and Start Docker Image" steps from these platforms, too. To the best of our understanding of Docker, later steps should be independent of the host platform.

Please note also that our terminology in the paper diverges subtly from the terminology in our code. The most important differences are the following:

Since JavaDL is an instance of MetaDL, our code base (and VM image) often use "MetaDL" to describe the language that the paper calls "JavaDL".
In the paper, we describe the "exhaustive" evaluation strategy (also known as "one-shot" or "non-incremental"). In the code base, we often refer to this strategy as "single evaluation". Parts of the code and documentation also use the term "hybrid evaluation" (referring to hybrid Datalog execution in both Soufflé and our own system).

Installing Docker

We are providing the JavaDL artifact as a Docker image. To run such an image, make sure to install the relevant tools:

For Windows and OS X systems, follow the guidelines on the Docker desktop download site
On Linux-based systems, install the docker command line tool. This tool may be provided by the docker.io and/or docker-ce packages. If your distribution does not provide these packages, follow the steps here:
- For Ubuntu
- For Debian
- For CentOS
- For Fedora
- Users of other distributions can download pre-compiled binaries or build Docker from source (both "cli" and "engine")

Starting Docker

Download the Docker image from https://doi.org/10.5281/zenodo.5090140 onto your machine.
Open a command shell in the directory into which you downloaded the image.
Make sure that you are on a command shell in the directory into which you have downloaded the image
Check the integrity of the docker image (this may take a minute or two):
- sha256sum javadl-oopsla21.tgz should report 49aa9fd6797095f9a33e6013fb582a7cfa1434d58826bcd82dd5df0706f65ad6
- md5sum javadl-oopsla21.tgz should report b5892a1bc2579f7be220f08c0064fb3a
Install the docker image:
```
docker load -i ./javadl-oopsla21.tgz
```
Start the docker image:
```
docker run -it javadl:oopsla21
```
This should produce the following prompt (except that the hex number after 'root@' may vary):
```
root@1114245ce565:/#
```

Basic Testing: Bug Checker Output Comparison

Once you are logged into the docker image, the fastest way to test our system is to run the following:

cd /work
./run_quality.bash

This will set up and run JavaDL, SpotBugs, and Error Prone on the benchmarks that we report on in the paper, and print out the bug comparison table (Fig. 13). On a modern machine, this should take around 15 minutes.

The output should look something like this (you can ignore the readlink warning):

root@1114245ce565:/work# ./run_quality.bash
/work/javadl-eval-1 /work
readlink: missing operand
Try 'readlink --help' for more information.

>>> Running the static checkers on the fixed versions from the Defects4j <<<
>>> Timing MetaDL-Hybrid
>>> Timing SpotBugs
>>> Timing ErrorProne

>>> Parsing and serializing output from the static checkers <<<
#MDL  #SB  #Common Precision Recall
DM_NUMBER_CTOR & 429 & 182 & 162 & 37.76 & 89.01
DM_STRING_CTOR & 5 & 5 & 5 & 100.00 & 100.00
EXPOSE_REP & 3546 & 161 & 108 & 3.05 & 67.08
NM_CLASS_NAMING_CONVENTION & 0 & 0 & 0 & -1.00 & -1.00
NM_FIELD_NAMING_CONVENTION & 874 & 0 & 0 & 0.00 & -1.00
NM_METHOD_NAMING_CONVENTION & 1120 & 38 & 38 & 3.39 & 100.00
SF_SWITCH_NO_DEFAULT & 223 & 87 & 81 & 36.32 & 93.10
UWF_UNWRITTEN_FIELD & 3 & 1 & 0 & 0.00 & 0.00
UWF_UNWRITTEN_PUBLIC_OR_PROTECTED_FIELD & 3 & 0 & 0 & 0.00 & -1.00
MDL vs. SB
Total SB:  474
Total MDL:  6206
Common:  395
MDL only:  5808
SB only:  80
####################################################################################################
#MDL  #EP  #Common Precision Recall
BoxedPrimitiveConstructor & 429 & 425 & 423 & 98.60 & 99.53
MissingOverride & 4533 & 5341 & 4393 & 96.91 & 82.25
OperatorPrecedence & 84 & 100 & 82 & 97.62 & 82.00
ReferenceEquality & 1176 & 1235 & 1169 & 99.40 & 94.66
TypeParameterUnusedInFormals & 99 & 95 & 95 & 95.96 & 100.00
UnnecessaryParentheses & 257 & 174 & 134 & 52.14 & 77.01
MDL vs. EP
Total EP:  7426
Total MDL:  6578
Common:  6297
MDL only:  281
EP only:  1074
####################################################################################################
/work

The output at the end contains the numbers for Figure 13.

Advanced Testing

Optionally, you can manually build and run JavaDL and its test suite. This is not necessary to reproduce our results, but helpful if you want to have a closer look at the system or extend JavaDL.

The following steps will re-run the test suite (all 136 tests should pass):

cd /work/metadl
./gradlew test

Please refer to /work/metadl/README.org for more details on manually running JavaDL.

Step by Step Instructions

In this section we describe how to reproduce our performance results and how to manually run JavaDL with custom queries.

Re-running our Benchmarks

To re-run our benchmarks, run the following in your docker image:

cd /work
./run_performance.bash

Depending on your machine, it may take over a day to re-run all benchmarks, so we recommend running on a dedicated benchmarking machine (using e.g. tmux or screen to be able to disconnect safely).

After running, the output should be similar to the following (concrete numbers may vary greatly depending on the underlying hardware):

                   proj  object_file_generate_relations_total  object_file_compile_total  hybrid_program_input  ...  sema_and_type_check  general_overhead  incremental_driver_overhead  update_local_relations
0               d4j-Cli                              0.293539                   0.056372              0.000000  ...             0.517667          0.091731                     0.000000                0.000000
1               d4j-Cli                              0.138508                   0.032728              0.022318  ...             0.390337          0.123824                     0.082171                0.184927
2             d4j-Codec                              0.388758                   0.074782              0.000000  ...             0.400047          0.080707                     0.000000                0.000000
3             d4j-Codec                              0.135542                   0.037518              0.054627  ...             0.407575          0.132363                     0.085168                0.113422
4          d4j-Compress                              0.460538                   0.063568              0.000000  ...             0.297225          0.103416                     0.000000                0.000000
5          d4j-Compress                              0.202481                   0.030771              0.062943  ...             0.309277          0.098028                     0.072345                0.186635
6               d4j-Csv                              0.279070                   0.055226              0.000000  ...             0.533767          0.090738                     0.000000                0.000000
7               d4j-Csv                              0.185438                   0.037808              0.023777  ...             0.397894          0.129165                     0.090716                0.105950
8       d4j-JacksonCore                              0.490699                   0.060332              0.000000  ...             0.289425          0.086140                     0.000000                0.000000
9   d4j-JacksonDatabind                              0.662087                   0.037427              0.000000  ...             0.125948          0.094318                     0.000000                0.000000
10       d4j-JacksonXml                              0.309379                   0.046775              0.000000  ...             0.494052          0.110116                     0.000000                0.000000
11       d4j-JacksonXml                              0.149556                   0.031302              0.024723  ...             0.390042          0.122664                     0.097267                0.162492
12            d4j-Jsoup                              0.392209                   0.063594              0.000000  ...             0.412381          0.080244                     0.000000                0.000000
13            d4j-Jsoup                              0.267734                   0.032205              0.024790  ...             0.268394          0.087463                     0.060649                0.218730
14             d4j-Math                              0.336936                   0.057061              0.000000  ...             0.448329          0.110705                     0.000000                0.000000
15             d4j-Math                              0.097620                   0.032702              0.039766  ...             0.399697          0.129623                     0.084926                0.189194
16          d4j-Mockito                              0.498430                   0.037437              0.000000  ...             0.324475          0.096073                     0.000000                0.000000
17          d4j-Mockito                              0.371869                   0.014380              0.017237  ...             0.165659          0.053631                     0.040926                0.318487
18             d4j-Time                              0.549664                   0.067943              0.000000  ...             0.208878          0.073654                     0.000000                0.000000
19             d4j-Time                              0.380325                   0.024303              0.033737  ...             0.145743          0.047100                     0.035855                0.274611

[20 rows x 11 columns]
['proj' 'run_kind' 'object_file_generate_relations_total'
 'object_file_compile_total' 'hybrid_program_input' 'hybrid_program_run'
 'hybrid_program_output' 'incremental_driver_init' 'sema_and_type_check'
 'update_local_relations' 'incremental_driver_overhead' 'general_overhead']
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
python/src/plot_change_vs_analyzed.py:123: RuntimeWarning: invalid value encountered in long_scalars
  all_data["n_files_rel"] = all_data.apply(lambda r : r["n_files"] / max_n_files[(r["proj"], r["run_kind"])], axis=1)
python/src/plot_change_vs_analyzed.py:131: RuntimeWarning: invalid value encountered in long_scalars
  max_n_touched_files[(r["proj"], r["run_kind"])], axis=1)
[0 1 2 3 4 5 6 7]
[0 1 2]
python/src/plot_change_vs_analyzed.py:54: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  _, axs = plt.subplots(1, 2,  figsize=(8, 4), gridspec_kw = {"width_ratios" : [8, 3]})
[0 1 2 3 4 5 6 7]
[0 1 2]
[0 1 2 3 4 5 6 7]
[0 1 2]
[0 1 2 3 4 5 6 7]
[0 1 2]
/work

before returning to the command prompt. Our system generates a number of plots in /work/javadl-inc-eval/plots_sb (comparing against SpotBugs) and /work/javadl-inc-eval/plots_ep (comparing against Error Prone). Most of these plots did not make it into the paper; the ones used in the paper are the following:

runtime_split.svg: Proportion of execution time spent on various tasks (Figure 15 in the paper)
violin_adjusted_total.svg: Distribution of running time (Figure 14 in the paper, except that the figures in the paper manually substitute times for the "Math" and "Mockito" tests, as noted in the paper.)
plots_ep/ecdf.svg: Distribution of the Relative Degree (Section 5.1) of files across projects (Figure 12 in the paper)

Note that the results will not match the ones in our paper, as we discuss in "Execution Time (Figure 14)" at the end of this document.

Trying custom JavaDL queries

Our VM image includes JavaDL at /work/metadl:

cd /work/metadl

Please refer to /work/metadl/README.org for more details on manually running JavaDL. The README covers both incremental and exhaustive evaluation, though the style of exhaustive evaluation that we report on in the paper is called "hybrid" evaluation there.

Please also note the following:

The version of JavaDL in this image uses the notation `X for metavariables, instead of #X, as reported in the paper.
Error reporting is limited in our current implementation:
- JavaDL will not describe where or why compiling a syntactic pattern failed.
- JavaDL does not check for whether the specification uses built-in predicates correctly (e.g., with the right arity). Arity discrepancies can therefore e.g. trigger array index bugs in internal JavaDL code.
- JavaDL's type inference can't infer the types of predicates that are only loaded from external files. For these predicates, make sure to also use them in a context that makes their type explicit (e.g., on the lhs of an inference rule with an unsatisfiable rhs).

For these reasons, we recommend new users to modify an existing .mdl file instead of building one from scratch.

List of Claims and Support

Our paper's "Conclusion" section summarizes our central claims from the Evaluation section. We use it here to structure our discussion of our claims and the support we provide for these claims (excluding claims that are unrelated to the artifact).

"JavaDL [...] can run [a] bug detector on Java from a single specification both exhaustively and incrementally, while automatically rewriting the specification to optimize it for incremental evaluation."

See subsection "Versatility", below.
"We have demonstrated that our prototype implementation, based on an existing high-performance Datalog engine, delivers performance that is competitive with state-of-the-practice systems":

(RQ1.2: Fig.13 reports statistics that we interpret as that "none of the differences [between the bug checking frameworks and JavaDL] indicate any systematic limitations and could be addressed by changing JavaDL bug patterns or fixing bugs in ExtendJ." Similarly, discussions regarding performance related to RQ2.)

See subsection "Output Similarity (Figure 13)" and subsection "Execution Time (Figure 14)", below.
"[...] that our specification language can concisely express typical bug detectors"

(RQ1.1: Fig. 11 provides statistics that "[indicate] that JavaDL requires fewer lines, and that it is closer to Error Prone in size than SpotBugs.")

See subsection "Bug Patterns and Code Sizes (Figure 11)", below.
"[...] that its incremental and exhaustive evaluation are both able to outshine each other in different usage scenarios"

See (again) subsection "Execution Time (Figure 14)", below.

Versatility

JavaDL can run the same specifications both in incremental and in exhaustive mode.

We support our claim here by /work/run_performance.bash and the code that it (transitively) invokes. Critically, run_performance.bash runs the same two collections of analyses, error-prone-metadl.mdl and spot-bugs-metadl.mdl, for both exhaustive (python/src/single.py) and incremental (python/src/incremental.py) runs.

Output Similarity (Figure 13)

These outputs are produced by the ./run_quality.bash script. The script directly prints out the relevant statistics in LaTeX format.

Discrepancies from the numbers reported in the Paper

During validation of our artifact evaluation package, we observed slight differences in the numbers reported for Error Prone, for the following benchmarks:

MissingOverride: Error Prone now reports 5341 bugs (as opposed to 5364, in the paper). The overall impact on this is a slight increase in our tools relative precision and relative recall (i.e., treating Error Prone as a proxy for the "ground truth").
OperatorPrecedence: JavaDL now reports 84 bugs (as opposed to 82, as in the paper). Error Prone's 100 reports only agree with one of these two new reports, so we see a slight increase in relative recall and a slight decrease in relative precision.
UnnecessaryParentheses: Error Prone now reports 174 bugs (as opposed to 175, in the paper). This slightly improves our relative recall.

We believe these changes to be due to minor version mismatches between either the tools or the benchmarks that we reported on in the paper but have not yet identified the exact cause. We will carefully review these changes and update the paper accordingly for the second submission round, but argue that these discrepancies do not affect the claims in the paper.

Detailed Statistics

For more detailed statistics, you can examine the list of bugs reported by each tool, in /work/javadl-eval-1/.

We split these between SpotBugs and Error Prone, reporting overlap for checkers that these tools share with JavaDL. For SpotBugs, the lists are in:

mdl_sb_common: Found by both SpotBugs and JavaDL
mdl_not_sb.out: Found only by JavaDL, not SpotBugs
sb_not_mdl.out: Found only by SpotBugs, not by JavaDL
mdl_not_sb_samples and sb_not_mdl_samples: The list of sampled differences that we manually examined (we generate this list from a fixed random seed that we selected blindly, so this list should be consistent between runs).

Analogously for Error Prone (substituting "ep_" for "sb_").

Bug Patterns and Code Sizes (Figure 11)

For the sizes of bug patterns, we refer to the source code that we examined.

For JavaDL, we list the file name of the check in the following directory on the VM image: /work/javadl-eval-1/static-checkers/metadl/tests/evaluation/metadl-java/

For SpotBugs and Error Prone, we refer to specific files at specific commit times in their revision history (referencing the time of the file's most recent modification, from the perspective of the baseline).

Covariant Equals()

JavaDL: bad-covariant-equals.mdl
Error Prone:
- File: ./core/src/main/java/com/google/errorprone/bugpatterns/NonOverridingEquals.java (revision fc4d14866e)
SpotBugs:
- File: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/FindHEmismatch.java (revision 05b9805296)

Boxed Primitive Constructor

JavaDL: boxed-primitive-constructor.mdl
Error Prone:
- File: ./core/src/main/java/com/google/errorprone/bugpatterns/BoxedPrimitiveConstructor.java (revision b3a20389d2)
SpotBugs:
- File 1: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/NumberConstructor.java (revision ac83bb9220)
- File 2: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/DumbMethods.java (revision 05b9805296)

Missing @Override

JavaDL: missing-override.mdl
Error Prone:
- File: ./core/src/main/java/com/google/errorprone/bugpatterns/MissingOverride.java (revision 6b489dd58e)

Complex Operator Precedence

JavaDL: operator-precedence.mdl
Error Prone:
- File: ./core/src/main/java/com/google/errorprone/bugpatterns/OperatorPrecedence.java (revision e41729a145)

Useless Type Parameter

JavaDL: type-param-unused-in-formals.mdl
Error Prone:
- File: ./core/src/main/java/com/google/errorprone/bugpatterns/TypeParameterUnusedInFormals.java (revision 46358f3709)

== with Reference

JavaDL: reference-equality.mdl
Error Prone:
- File: ./core/src/main/java/com/google/errorprone/bugpatterns/ReferenceEquality.java (revision 1c1e233dd)

Field Never Written to

JavaDL: unwritten-field.mdl
SpotBugs:
- File: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/UnreadFields.java (revision 05b9805296)

Missing Switch Default

JavaDL: switch-no-default.mdl
SpotBugs:
- File: ./spotbugs/src/main/java/edu/umd/cs/findbugs/SwitchFallthrough.java (revision d881dd0)

Expose internal Representation

JavaDL: reference-to-mutable-object.mdl
SpotBugs:
- File: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/FindReturnRef.java (revision 3a4f4ddbcf)

Naming Convention Violation

JavaDL: naming-convention.mdl
SpotBugs:
- File: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/Naming.java (revision 03cdfb4ece)

Clone Idioms Violated

JavaDL: clone-idioms.mdl
SpotBugs:
- File: ./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/CloneIdiom.java (revision 03cdfb4ece)

Execution Time (Figure 14)

We support these claims by providing the scripts necessary to re-run our benchmarks (cf. "Re-running our Benchmarks"). Our artifact will not report the same numbers reported in the paper, for the following reasons:

changes to souffle: for our experiments, we used souffle's default settings, which optimize souffle for a specific microarchitecture while dropping support for others. For reusability we relaxed these optimizations (changing -march=native to -mtune=native in Soufflé's src/souffle-compile.in). To fix this, you can edit /work/souffle/src/souffle-compile.in and re-build/re-install for the native machine, with ./configure && make clean && make && make install.
subtle differences in running in a VM image vs. our original experimental configuration. To fix this, you can copy our evaluation setup out of the VM container.
performance differences due to hardware differences (cf. our paper for the exact setup that we used).

Files

Files (8.3 GB)

Name	Size	Download all
javadl-oopsla21.tgz md5:b5892a1bc2579f7be220f08c0064fb3a	8.3 GB	Download

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	358	155
Downloads	32	20
Data volume	316.2 GB	191.3 GB

JavaDL: Automatically Incrementalizing Java Bug Pattern Detection

Creators

Description

Files

Files (8.3 GB)