PUnits: Precise Inference of Expressive Units of Measurement Types

A pluggable type system for expressive units of measurement types and a precise, whole-program inference approach for these types. Punits can be used in three modes: (1) modularly check the correctness of a program, (2) ensure a possible unit typing exists, (3) annotate a program with units. This image is created for the OOPSLA 2020 artifact evaluation.

Getting Started

Pull the docker image and start the container

docker run -d -it txiang61/punits-artifact:oopsla2020 /bin/bash
docker exec -it <container id> /bin/bash

File Organization

The PUnits project is located in /home/opprop/units-inference.

The benchmark scripts are located in /home/opprop/units-inference/benchmark

The Coq proof is located in /home/opprop/units-inference/coq-proof

Building PUnits

Go to /home/opprop/units-inference. The project should already be built, but to be sure, pull the latest changes and compile.

git pull && ./rebuildCFCFI.sh && ./gradlew assemble

Run the test script to ensure PUnits is working properly.

./mini-tests.sh

All individual tests should pass and the overall build should finish successfully.

Proofs of pUnits Formalization

cd /home/opprop/units-inference/coq-proof && ./compile.sh

This script should compile without any warnings or errors. More details on the pUnits Coq proofs are in /home/opprop/units-inference/coq-proof/README.md in the image.

Running the Benchmarks from the Paper

Please note that depending on the hardware of your machine, these tests may take from a few minutes to more than an hour (more than an hour for annotation mode). Enter the benchmark folder /home/opprop/units-inference/benchmark.

Running Type-Checking Benchmarks

To run type-checking benchmarks use:

./run-benchmark-typecheck.sh <path to YAML file without the .yml extension e.g. paper-typecheck>

This script runs PUnits in type-check mode on all of the projects in the .yml file from a corpus. We recommend runningpaper-typecheck.yml as this file contains all projects mentioned in the paper (except for Daikon due to special build instructions). This script may take 5 to 15 minutes to run depending on the machine. This script creates a folder that has the same name as the YAML file under the benchmark directory that contains all the projects we run type-checked mode on.

The expected result for running paper-typecheck is Failed projects are: ['react', 'jReactPhysics3D', 'GasFlow', 'imgscalr', 'jblas', 'exp4j', 'JLargeArrays']. These errors are expected from unannotated projects.

To run units type-check on Daikon:

./run-daikon-typecheck.sh <master/unit-error/error-fixed>

This script may take 10 to 30 minutes to run depending on the machine. This script runs type-check on the Daikon project on three different versions. The master version is the unannotated project and will issue type errors as it is unannotated. The unit-error version is fully annotated with the unit bug inserted and should issue one type error. The error-fixed version is fully annotated with the bug fixed. You can ignore output warning: Did not find stub file /javadoc.astub on classpath or within directory / or at checker.jar/javadoc.astub error: warnings found and -Werror specified as these are not type errors produced by PUnits.

We claimed in the paper that PUnits is able to detect the bug inserted (Section 5.2 Daikon paragraph). The artifact supports this claim.

To inspect the overview results of type check:

../experiment-tools/gen-typecheck-table.sh <path to result folder e.g. paper-typecheck>

The table lists 4 kinds of type errors found in the projects. These errors are expected from unannotated projects as flows were detected which propagate units from annotated methods to defaulted @Dimensionless method parameters or returns.

To inspect the detailed report for each project:

cat <path to result folder>/<project name>/logs/infer.log

We recommend inspecting the detailed report for project GasFlow. We claim in the paper that PUnits is able to detect unit-related errors where encapsulation-based units APIs like JScience have failed (Section 5.1.1). We also claim that PUnits enforces good coding practice by giving warnings on type-unsafe heterogeneous methods (Section 5.1.3). The type errors issued by PUnits also matches with the GasFlow errors mentioned in the paper (Section 5.2 GasFlow paragraph). The artifact supports our lists of claims.

We claim that PUnits is able to type check all eight projects and the errors issued are expected as the project is unannotated (Section 5.2 first paragraph). The artifact supports this claim.

Running Whole-Program-Inference Benchmarks

To run whole-program-inference without annotation mode:

./run-benchmark-infer.sh <path to YAML file without the .yml extension e.g. paper-inference>

This script runs PUnits in whole-program-inference mode on all of the projects in the .yml file from a corpus. This script may take 5 to 30 minutes to run depending on the machine. We recommend running paper-inference.yml as this file contains the 6 projects mentioned in the paper that we run inference on.

The expected result from running paper-inference.yml is Successful projects are: ['react', 'jReactPhysics3D', 'imgscalr']. Failed projects are: ['jblas', 'exp4j', 'JLargeArrays']. The failures/UNSATs are expected. In the paper, jReactPhysics3D is evaluated to UNSAT instead of SAT. As stated in the paper (Section 5.2 jReactPhysics3D paragraph), the reason why this project evaluates to UNSAT is because PUnits assumes the raw type Iterator is actually Iterator<@UnitsTop Object>}. The project reached UNSAT in inference because of the flow of a value obtained from this iterator into a parameter that expects a @Dimensionless value. PUnits now sets the bounds to @Dimensionless, and thus the project reaches SAT.

Running whole-program-inference with annotation mode:

./run-benchmark-inference.sh true <path to YAML file without the .yml extension e.g. paper-annotation>

This will take more than an hour to run. This script runs PUnits in whole-program-inference with annotation mode on all of the projects in the .yml file from a corpus. We recommend runningpaper-annotation.yml as this file contains the 5 projects mentioned in the paper that we run annotation mode on.

The expected result running paper-annotation is ----- Inference successfully inferred all 5 projects. -----..

To inspect the overview results of whole program inference:

../experiment-tools/gen-inference-summary.sh <path to result folder e.g. paper-annotation>
../experiment-tools/gen-inference-table.sh <path to result folder e.g. paper-annotation>

This result supports the claims in Figures 14 and 15 of the paper. Please note that the number of variables and constraints generated may slightly differ from what is specified in the paper as PUnits and its dependencies, Checker Framework and Checker Framework Inference, evolved. The final paper will use numbers consistent with the final artifact.

To inspect the detailed report for each project:

cat <path to result folder>/<project name>/logs/infer.log

To inspect the annotated source code for each successfully inferred project:

cat <path to result folder>/<project name>/annotated

GasFlow Performance Benchmark

To see the performance numbers run:

./run-GasFlow-performance.sh

This script invokes the main method in the GasFlow project for testing the time and memory performance difference between using encapsulation-based units APIs like JScience vs. PUnits. The first run uses the JScience library to enforce unit correctness. The second run uses PUnits to enforce unit correctness. This result supports the claim in Section 5.1.2 of the paper and that PUnits reaps the performance benefits of using primitive types instead of abstract data types for unit-wise consistent scientific computations.

PUnits Performance Benchmark

To get the OpenJDK compilation times for the projects:

./run-benchmark-compile.sh <path to result folder e.g. paper-typecheck>

If running with arg paper-typecheck, project imgscalr will fail to compile. This is alright since the reason why this project failed to compile is due to the source and target version. It will not affect the PUnits evaluations.

Timing logs are created when running the benchmark scripts. compileTiming.log contains the OpenJDK compilation time. typecheckTiming.log contains the type-checking time. inferTiming.log contains the inference or annotation time. To view the timing logs:

grep -r "Time taken" <path to result folder> | sort

You can compare the timing to the OpenJDK compilation times.

The performance overhead varies depending on the projects and the machine used to run. The paper claims that its performance is adequate for use in a real-world software development environment (Section 5.3). Overall, this artifact supports this claim.

A Step-by-Step Guide for Using PUnits

  1. Go to the PUnits project folder /home/opprop/units-inference

  2. Custom base units and aliases (optional step)

    a. Check the currently supported base units with ./experiment-tools/get-num-baseunits.sh. See if any desired base units are missing.

    b. src/units/qual contains all base units and unit aliases used. You can move unneeded base units and aliases to src/units/notusedqual.

    c. Create new base units and new unit aliases. Look at existing files for reference.

  3. Annotate JDK specifications and libraries (optional step)

    a. All .astub files in /src/units are the annotated JDK and libraries

    b. Create your .astub files. Look at existing stub files for reference.

    c. Add the files to the @StubFiles{} list in src/units/UnitChecker.java. You can comment out files that you won't need.

  4. Build PUnits with ./gradlew assemble

  5. Run PUnits

    a. Run type-check mode on .java files:

    /home/opprop/units-inference/script/run-units-typecheck.sh <java files>

    b. Run type-check mode on a Java project: Go to project folder and remember to clean the project first (to ensure everything is re-checked):

    /home/opprop/units-inference/script/run-dljc-typecheck.sh "<build command>"

    c. Run inference mode on .java files:

    /home/opprop/units-inference/script/run-units-infer.sh false <java files>

    d. Run inference mode on a Java project: Go to the project folder and remember to clean the project:

    /home/opprop/units-inference/script/run-dljc-inference.sh false "<build command>"

    e. Run annotation mode on .java files:

    /home/opprop/units-inference/script/run-units-infer.sh true <java files>

    f. Run annotation mode on a Java project: Go to the project folder and remember to clean the project:

    /home/opprop/units-inference/script/run-dljc-inference.sh true "<build command>"

Summary of Claims in the Paper

Please see the running benchmark section for details on how and why these claims are, and are not, supported.

Supported Claims

  • PUnits successfully run type checking mode on the 8 projects mentioned in the paper
  • PUnits can detect bugs in real-world applications (e.g. Gasflow and Daikon)
  • PUnits can detect unit errors where encapsulation-based units APIs have failed
  • Good coding practices are enforced by giving warnings during type-check or evaluating the project to UNSAT on type-unsafe heterogeneous methods or arrays
  • PUnits reaps the performance benefits of using primitive types instead of abstract data types for unit-wise consistent scientific computations
  • PUnits successfully run inference mode on the 6 projects mentioned in the paper
  • PUnits successfully inferred precise units with annotation mode on the 5 projects mentioned in the paper. The units inferred by artifact is consistent with the paper
  • The inferred annotations are inserted back into the source code for human inspection
  • Performance is adequate for use in a real-world software development environment

Not Supported Claims

  • During inference and annotation mode, the number of variables and constraints generated may slightly differ from what is specified in the paper as PUnits and its dependencies, Checker Framework and Checker Framework Inference, evolved. The final paper will use numbers consistent with the final artifact.
  • In the paper, jReactPhysics3D is evaluated to UNSAT during inference mode. The artifact evaluates it to SAT because of the changes in PUnits after the paper submission. The final paper will be consistent with the final artifact.
  • The performance overhead number may not support the numbers claimed in the paper as performance varies depending on the project and the machine.