JavaDL: Automatically Incrementalizing Java Bug Pattern Detection
Description
JavaDL Artifact Introduction
JavaDL is a language and toolchain for analyzing Java at the source level. JavaDL is based on MetaDL, a variant of Datalog that adds syntactic pattern matching, here specialized to pattern matching on Java.
This Artifact Overview guide describes the VM image that we provide for artifact evaluation:
- Getting Started
- Step-by-step instructions for using JavaDL and for reproducing our results
- Claims from the paper and how our artifact supports them
Getting Started Guide
For the following steps, we assume a typical modern machine with a recent multi-core CPU and 64 GiB of RAM. The process may take more than 30 minutes on older machines, but most of this will be automated.
The necessary steps are:
- Installing Docker
- Installing and Starting Docker Image
- Basic Testing
- Advanced Testing (optional)
Caveats
Please note that we have only tested the process below on Ubuntu and Debian Linux. The same process may or may not work for OS X or Windows, though we are hopeful that the Docker website provides sufficient guidance to run the "Install Docker" and "Install and Start Docker Image" steps from these platforms, too. To the best of our understanding of Docker, later steps should be independent of the host platform.
Please note also that our terminology in the paper diverges subtly from the terminology in our code. The most important differences are the following:
-
Since JavaDL is an instance of MetaDL, our code base (and VM image) often use "MetaDL" to describe the language that the paper calls "JavaDL".
-
In the paper, we describe the "exhaustive" evaluation strategy (also known as "one-shot" or "non-incremental"). In the code base, we often refer to this strategy as "single evaluation". Parts of the code and documentation also use the term "hybrid evaluation" (referring to hybrid Datalog execution in both Soufflé and our own system).
Installing Docker
We are providing the JavaDL artifact as a Docker image. To run such an image, make sure to install the relevant tools:
- For Windows and OS X systems, follow the guidelines on the Docker desktop download site
- On Linux-based systems, install the
docker
command line tool. This tool may be provided by thedocker.io
and/ordocker-ce
packages. If your distribution does not provide these packages, follow the steps here:- For Ubuntu
- For Debian
- For CentOS
- For Fedora
- Users of other distributions can download pre-compiled binaries or build Docker from source (both "cli" and "engine")
Starting Docker
- Download the Docker image from https://doi.org/10.5281/zenodo.5090140 onto your machine.
- Open a command shell in the directory into which you downloaded the image.
- Make sure that you are on a command shell in the directory into which you have downloaded the image
- Check the integrity of the docker image (this may take a minute or two):
sha256sum javadl-oopsla21.tgz
should report49aa9fd6797095f9a33e6013fb582a7cfa1434d58826bcd82dd5df0706f65ad6
md5sum javadl-oopsla21.tgz
should reportb5892a1bc2579f7be220f08c0064fb3a
-
Install the docker image:
docker load -i ./javadl-oopsla21.tgz
-
Start the docker image:
docker run -it javadl:oopsla21
This should produce the following prompt (except that the hex number after 'root@' may vary):
root@1114245ce565:/#
Basic Testing: Bug Checker Output Comparison
Once you are logged into the docker image, the fastest way to test our system is to run the following:
cd /work
./run_quality.bash
This will set up and run JavaDL, SpotBugs, and Error Prone on the benchmarks that we report on in the paper, and print out the bug comparison table (Fig. 13). On a modern machine, this should take around 15 minutes.
The output should look something like this (you can ignore the readlink
warning):
root@1114245ce565:/work# ./run_quality.bash
/work/javadl-eval-1 /work
readlink: missing operand
Try 'readlink --help' for more information.
>>> Running the static checkers on the fixed versions from the Defects4j <<<
>>> Timing MetaDL-Hybrid
>>> Timing SpotBugs
>>> Timing ErrorProne
>>> Parsing and serializing output from the static checkers <<<
#MDL #SB #Common Precision Recall
DM_NUMBER_CTOR & 429 & 182 & 162 & 37.76 & 89.01
DM_STRING_CTOR & 5 & 5 & 5 & 100.00 & 100.00
EXPOSE_REP & 3546 & 161 & 108 & 3.05 & 67.08
NM_CLASS_NAMING_CONVENTION & 0 & 0 & 0 & -1.00 & -1.00
NM_FIELD_NAMING_CONVENTION & 874 & 0 & 0 & 0.00 & -1.00
NM_METHOD_NAMING_CONVENTION & 1120 & 38 & 38 & 3.39 & 100.00
SF_SWITCH_NO_DEFAULT & 223 & 87 & 81 & 36.32 & 93.10
UWF_UNWRITTEN_FIELD & 3 & 1 & 0 & 0.00 & 0.00
UWF_UNWRITTEN_PUBLIC_OR_PROTECTED_FIELD & 3 & 0 & 0 & 0.00 & -1.00
MDL vs. SB
Total SB: 474
Total MDL: 6206
Common: 395
MDL only: 5808
SB only: 80
####################################################################################################
#MDL #EP #Common Precision Recall
BoxedPrimitiveConstructor & 429 & 425 & 423 & 98.60 & 99.53
MissingOverride & 4533 & 5341 & 4393 & 96.91 & 82.25
OperatorPrecedence & 84 & 100 & 82 & 97.62 & 82.00
ReferenceEquality & 1176 & 1235 & 1169 & 99.40 & 94.66
TypeParameterUnusedInFormals & 99 & 95 & 95 & 95.96 & 100.00
UnnecessaryParentheses & 257 & 174 & 134 & 52.14 & 77.01
MDL vs. EP
Total EP: 7426
Total MDL: 6578
Common: 6297
MDL only: 281
EP only: 1074
####################################################################################################
/work
The output at the end contains the numbers for Figure 13.
Advanced Testing
Optionally, you can manually build and run JavaDL and its test suite. This is not necessary to reproduce our results, but helpful if you want to have a closer look at the system or extend JavaDL.
The following steps will re-run the test suite (all 136 tests should pass):
cd /work/metadl
./gradlew test
Please refer to /work/metadl/README.org
for more details on manually running JavaDL.
Step by Step Instructions
In this section we describe how to reproduce our performance results and how to manually run JavaDL with custom queries.
Re-running our Benchmarks
To re-run our benchmarks, run the following in your docker image:
cd /work
./run_performance.bash
Depending on your machine, it may take over a day to re-run all benchmarks, so we recommend running on a dedicated benchmarking machine (using e.g. tmux
or screen
to be able to disconnect safely).
After running, the output should be similar to the following (concrete numbers may vary greatly depending on the underlying hardware):
proj object_file_generate_relations_total object_file_compile_total hybrid_program_input ... sema_and_type_check general_overhead incremental_driver_overhead update_local_relations
0 d4j-Cli 0.293539 0.056372 0.000000 ... 0.517667 0.091731 0.000000 0.000000
1 d4j-Cli 0.138508 0.032728 0.022318 ... 0.390337 0.123824 0.082171 0.184927
2 d4j-Codec 0.388758 0.074782 0.000000 ... 0.400047 0.080707 0.000000 0.000000
3 d4j-Codec 0.135542 0.037518 0.054627 ... 0.407575 0.132363 0.085168 0.113422
4 d4j-Compress 0.460538 0.063568 0.000000 ... 0.297225 0.103416 0.000000 0.000000
5 d4j-Compress 0.202481 0.030771 0.062943 ... 0.309277 0.098028 0.072345 0.186635
6 d4j-Csv 0.279070 0.055226 0.000000 ... 0.533767 0.090738 0.000000 0.000000
7 d4j-Csv 0.185438 0.037808 0.023777 ... 0.397894 0.129165 0.090716 0.105950
8 d4j-JacksonCore 0.490699 0.060332 0.000000 ... 0.289425 0.086140 0.000000 0.000000
9 d4j-JacksonDatabind 0.662087 0.037427 0.000000 ... 0.125948 0.094318 0.000000 0.000000
10 d4j-JacksonXml 0.309379 0.046775 0.000000 ... 0.494052 0.110116 0.000000 0.000000
11 d4j-JacksonXml 0.149556 0.031302 0.024723 ... 0.390042 0.122664 0.097267 0.162492
12 d4j-Jsoup 0.392209 0.063594 0.000000 ... 0.412381 0.080244 0.000000 0.000000
13 d4j-Jsoup 0.267734 0.032205 0.024790 ... 0.268394 0.087463 0.060649 0.218730
14 d4j-Math 0.336936 0.057061 0.000000 ... 0.448329 0.110705 0.000000 0.000000
15 d4j-Math 0.097620 0.032702 0.039766 ... 0.399697 0.129623 0.084926 0.189194
16 d4j-Mockito 0.498430 0.037437 0.000000 ... 0.324475 0.096073 0.000000 0.000000
17 d4j-Mockito 0.371869 0.014380 0.017237 ... 0.165659 0.053631 0.040926 0.318487
18 d4j-Time 0.549664 0.067943 0.000000 ... 0.208878 0.073654 0.000000 0.000000
19 d4j-Time 0.380325 0.024303 0.033737 ... 0.145743 0.047100 0.035855 0.274611
[20 rows x 11 columns]
['proj' 'run_kind' 'object_file_generate_relations_total'
'object_file_compile_total' 'hybrid_program_input' 'hybrid_program_run'
'hybrid_program_output' 'incremental_driver_init' 'sema_and_type_check'
'update_local_relations' 'incremental_driver_overhead' 'general_overhead']
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
python/src/plot_change_vs_analyzed.py:123: RuntimeWarning: invalid value encountered in long_scalars
all_data["n_files_rel"] = all_data.apply(lambda r : r["n_files"] / max_n_files[(r["proj"], r["run_kind"])], axis=1)
python/src/plot_change_vs_analyzed.py:131: RuntimeWarning: invalid value encountered in long_scalars
max_n_touched_files[(r["proj"], r["run_kind"])], axis=1)
[0 1 2 3 4 5 6 7]
[0 1 2]
python/src/plot_change_vs_analyzed.py:54: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
_, axs = plt.subplots(1, 2, figsize=(8, 4), gridspec_kw = {"width_ratios" : [8, 3]})
[0 1 2 3 4 5 6 7]
[0 1 2]
[0 1 2 3 4 5 6 7]
[0 1 2]
[0 1 2 3 4 5 6 7]
[0 1 2]
/work
before returning to the command prompt. Our system generates a number of plots in /work/javadl-inc-eval/plots_sb
(comparing against SpotBugs) and /work/javadl-inc-eval/plots_ep
(comparing against Error Prone). Most of these plots did not make it into the paper; the ones used in the paper are the following:
runtime_split.svg
: Proportion of execution time spent on various tasks (Figure 15 in the paper)violin_adjusted_total.svg
: Distribution of running time (Figure 14 in the paper, except that the figures in the paper manually substitute times for the "Math" and "Mockito" tests, as noted in the paper.)plots_ep/ecdf.svg
: Distribution of the Relative Degree (Section 5.1) of files across projects (Figure 12 in the paper)
Note that the results will not match the ones in our paper, as we discuss in "Execution Time (Figure 14)" at the end of this document.
Trying custom JavaDL queries
Our VM image includes JavaDL at /work/metadl
:
cd /work/metadl
Please refer to /work/metadl/README.org
for more details on manually running JavaDL. The README covers both incremental and exhaustive evaluation, though the style of exhaustive evaluation that we report on in the paper is called "hybrid" evaluation there.
Please also note the following:
- The version of JavaDL in this image uses the notation
`X
for metavariables, instead of#X
, as reported in the paper. - Error reporting is limited in our current implementation:
- JavaDL will not describe where or why compiling a syntactic pattern failed.
- JavaDL does not check for whether the specification uses built-in predicates correctly (e.g., with the right arity). Arity discrepancies can therefore e.g. trigger array index bugs in internal JavaDL code.
- JavaDL's type inference can't infer the types of predicates that are only loaded from external files. For these predicates, make sure to also use them in a context that makes their type explicit (e.g., on the lhs of an inference rule with an unsatisfiable rhs).
For these reasons, we recommend new users to modify an existing .mdl
file instead of building one from scratch.
List of Claims and Support
Our paper's "Conclusion" section summarizes our central claims from the Evaluation section. We use it here to structure our discussion of our claims and the support we provide for these claims (excluding claims that are unrelated to the artifact).
-
"JavaDL [...] can run [a] bug detector on Java from a single specification both exhaustively and incrementally, while automatically rewriting the specification to optimize it for incremental evaluation."
See subsection "Versatility", below.
-
"We have demonstrated that our prototype implementation, based on an existing high-performance Datalog engine, delivers performance that is competitive with state-of-the-practice systems":
(RQ1.2: Fig.13 reports statistics that we interpret as that "none of the differences [between the bug checking frameworks and JavaDL] indicate any systematic limitations and could be addressed by changing JavaDL bug patterns or fixing bugs in ExtendJ." Similarly, discussions regarding performance related to RQ2.)
See subsection "Output Similarity (Figure 13)" and subsection "Execution Time (Figure 14)", below.
-
"[...] that our specification language can concisely express typical bug detectors"
(RQ1.1: Fig. 11 provides statistics that "[indicate] that JavaDL requires fewer lines, and that it is closer to Error Prone in size than SpotBugs.")
See subsection "Bug Patterns and Code Sizes (Figure 11)", below.
-
"[...] that its incremental and exhaustive evaluation are both able to outshine each other in different usage scenarios"
See (again) subsection "Execution Time (Figure 14)", below.
Versatility
JavaDL can run the same specifications both in incremental and in exhaustive mode.
We support our claim here by /work/run_performance.bash
and the code that it (transitively) invokes. Critically, run_performance.bash
runs the same two collections of analyses, error-prone-metadl.mdl
and spot-bugs-metadl.mdl
, for both exhaustive (python/src/single.py
) and incremental (python/src/incremental.py
) runs.
Output Similarity (Figure 13)
These outputs are produced by the ./run_quality.bash
script. The script directly prints out the relevant statistics in LaTeX format.
Discrepancies from the numbers reported in the Paper
During validation of our artifact evaluation package, we observed slight differences in the numbers reported for Error Prone, for the following benchmarks:
-
MissingOverride
: Error Prone now reports 5341 bugs (as opposed to 5364, in the paper). The overall impact on this is a slight increase in our tools relative precision and relative recall (i.e., treating Error Prone as a proxy for the "ground truth"). -
OperatorPrecedence
: JavaDL now reports 84 bugs (as opposed to 82, as in the paper). Error Prone's 100 reports only agree with one of these two new reports, so we see a slight increase in relative recall and a slight decrease in relative precision. -
UnnecessaryParentheses
: Error Prone now reports 174 bugs (as opposed to 175, in the paper). This slightly improves our relative recall.
We believe these changes to be due to minor version mismatches between either the tools or the benchmarks that we reported on in the paper but have not yet identified the exact cause. We will carefully review these changes and update the paper accordingly for the second submission round, but argue that these discrepancies do not affect the claims in the paper.
Detailed Statistics
For more detailed statistics, you can examine the list of bugs reported by each tool, in /work/javadl-eval-1/
.
We split these between SpotBugs and Error Prone, reporting overlap for checkers that these tools share with JavaDL. For SpotBugs, the lists are in:
mdl_sb_common
: Found by both SpotBugs and JavaDLmdl_not_sb.out
: Found only by JavaDL, not SpotBugssb_not_mdl.out
: Found only by SpotBugs, not by JavaDLmdl_not_sb_samples
andsb_not_mdl_samples
: The list of sampled differences that we manually examined (we generate this list from a fixed random seed that we selected blindly, so this list should be consistent between runs).
Analogously for Error Prone (substituting "ep_" for "sb_").
Bug Patterns and Code Sizes (Figure 11)
For the sizes of bug patterns, we refer to the source code that we examined.
For JavaDL, we list the file name of the check in the following directory on the VM image: /work/javadl-eval-1/static-checkers/metadl/tests/evaluation/metadl-java/
For SpotBugs and Error Prone, we refer to specific files at specific commit times in their revision history (referencing the time of the file's most recent modification, from the perspective of the baseline).
Covariant Equals()
- JavaDL:
bad-covariant-equals.mdl
- Error Prone:
- File:
./core/src/main/java/com/google/errorprone/bugpatterns/NonOverridingEquals.java
(revision fc4d14866e)
- File:
- SpotBugs:
- File:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/FindHEmismatch.java
(revision 05b9805296)
- File:
Boxed Primitive Constructor
- JavaDL:
boxed-primitive-constructor.mdl
- Error Prone:
- File:
./core/src/main/java/com/google/errorprone/bugpatterns/BoxedPrimitiveConstructor.java
(revision b3a20389d2)
- File:
- SpotBugs:
- File 1:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/NumberConstructor.java
(revision ac83bb9220) - File 2:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/DumbMethods.java
(revision 05b9805296)
- File 1:
Missing @Override
- JavaDL:
missing-override.mdl
- Error Prone:
- File:
./core/src/main/java/com/google/errorprone/bugpatterns/MissingOverride.java
(revision 6b489dd58e)
- File:
Complex Operator Precedence
- JavaDL:
operator-precedence.mdl
- Error Prone:
- File:
./core/src/main/java/com/google/errorprone/bugpatterns/OperatorPrecedence.java
(revision e41729a145)
- File:
Useless Type Parameter
- JavaDL:
type-param-unused-in-formals.mdl
- Error Prone:
- File:
./core/src/main/java/com/google/errorprone/bugpatterns/TypeParameterUnusedInFormals.java
(revision 46358f3709)
- File:
== with Reference
- JavaDL:
reference-equality.mdl
- Error Prone:
- File:
./core/src/main/java/com/google/errorprone/bugpatterns/ReferenceEquality.java
(revision 1c1e233dd)
- File:
Field Never Written to
- JavaDL:
unwritten-field.mdl
- SpotBugs:
- File:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/UnreadFields.java
(revision 05b9805296)
- File:
Missing Switch Default
- JavaDL:
switch-no-default.mdl
- SpotBugs:
- File:
./spotbugs/src/main/java/edu/umd/cs/findbugs/SwitchFallthrough.java
(revision d881dd0)
- File:
Expose internal Representation
- JavaDL:
reference-to-mutable-object.mdl
- SpotBugs:
- File:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/FindReturnRef.java
(revision 3a4f4ddbcf)
- File:
Naming Convention Violation
- JavaDL:
naming-convention.mdl
- SpotBugs:
- File:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/Naming.java
(revision 03cdfb4ece)
- File:
Clone Idioms Violated
- JavaDL:
clone-idioms.mdl
- SpotBugs:
- File:
./spotbugs/src/main/java/edu/umd/cs/findbugs/detect/CloneIdiom.java
(revision 03cdfb4ece)
- File:
Execution Time (Figure 14)
We support these claims by providing the scripts necessary to re-run our benchmarks (cf. "Re-running our Benchmarks"). Our artifact will not report the same numbers reported in the paper, for the following reasons:
- changes to
souffle
: for our experiments, we usedsouffle
's default settings, which optimizesouffle
for a specific microarchitecture while dropping support for others. For reusability we relaxed these optimizations (changing-march=native
to-mtune=native
in Soufflé'ssrc/souffle-compile.in
). To fix this, you can edit/work/souffle/src/souffle-compile.in
and re-build/re-install for the native machine, with./configure && make clean && make && make install
. - subtle differences in running in a VM image vs. our original experimental configuration. To fix this, you can copy our evaluation setup out of the VM container.
- performance differences due to hardware differences (cf. our paper for the exact setup that we used).
Files
Files
(8.3 GB)
Name | Size | Download all |
---|---|---|
md5:b5892a1bc2579f7be220f08c0064fb3a
|
8.3 GB | Download |