Probabilistic Energy Profiler For Java
Authors/Creators
Description
Introduction
The artifact contains the whole dataset and all code files used in the article "Probabilistic Energy Profiler for Java" by Joel Nyholm, Wojciech Mostowski, and Christoph Reichenbach.
The paper aims to create an energy profiler for Java bytecode patterns, the translation of a Java code statement to its bytecode sequence. We focus on variable declarations and additions of the int, long, float, and double data types. All operations were executed on a Raspberry Pi 5, and their energy was measured using the Keithley 2602 SourceMeter. Hence, the data collection cannot easily be reproduced; one needs access to the measurement hardware.
The dataset has three files: a class diagram for the benchmark harness, a VirtualBox image ("Debian.ova") and a compressed directory ("Data and code.zip"). The ZIP directory contains all the code and data used in the paper. The VM image also has this, and R, Rstudio, Java, and Python are installed to execute the code files.
The VirtualBox image can, without external hardware, execute the Python script that converts the raw data (execution time, voltage, and current measurements) to a CSV file and calculates the operations' energy in Joules. The image can also execute the R script, to construct the Bayesian model and generate the plots in the paper.
Executing experiments in the VM
If needed the password for the vboxuser on the VM is "user"
To run the Python script, use: python3 toCSV.py
The script will construct a CSV file using directories in the "Raw Data" directory. The directory it will use as input data depends on the information on line 24 in toCSV.py.
To run the R script, use:rstudio 'R scripts and data'/model.r
Running the command above will open Rstudio, where the probabilistic model used in the paper is pre-loaded.
To execute the code to get all plots, do the following in Rstudio:Alt+Ctrl+R
or
press Code > Run Region > Run All in the taskbar
The construction of the Bayesian model is commented out; if you wish to rebuild the model, you must remove the comments (#) in lines 289-296.
Code files/directories
Java Code (benchmark harness)
In ClassDiagram.png, you can see the class diagram for the Java code. The Main class is the entry point for the benchmark harness. It uses the Addition, Variable, TestIntAdd, TestIntVarDecl, and TestBytecodeSequence classes. The classes starting with Test contain the code used to test against the Bayesian model's predictions. The Addition and Variable class contains the microbenchmarks for creating the Bayesian model. The Bytecode class handles the execution ordering and execution of these microbenchmarks. The Multimeter and its subclass Keithley2602 handle communication with the Keithley2602 SourceMeter through RS-232.
Scripts
There exists three scripts in the Data and Code directory:
- CPU_counter_test.sh: Executes the benchmark harness and gives counters for L1 caches and the branch predictor (uses external equipment)
- run_Java_profiler.sh: Executes the benchmark harness (External hardware is needed)
- toCSV.py: Converts the execution time, voltage and amperage data into a CSV file containing Joules
In the R scripts and data directory there are an additional two script files and one data file (model_data.RData) that contains the data used for the paper:
- model.R: The R code for executing the Bayesian model and representing its results with plots
- model-non-centered.stan: The Stan code that constructs the Bayesian model
Data directories
The "Data and code" directory contains two folders containing the data for the experiment. The Raw data directory contains six directories, each with execution time, voltage, and amperage results and one CSV file:
- results_50additions: Contains results for 50 additions (TestIntAdd) in growing order: 1 addition, 2 additions, ..., 50 additions
- results_50Vardecl: Contains results for 50 variable declarations (TestIntVarDecl) in growing order: 1 declaration, 2 declarations, ..., 50 declarations
- results_pi5_dev1: Contains the results from executing the microbenchmarks (addition and variable declarations) for device 1
- results_pi5_dev1: Contains the results from executing the microbenchmarks (addition and variable declarations) for device 2
- results_testfunctions_dev2: Contains the results from executing the TestBytecodeSequence for device 1
- results_testfunctions_dev1: Contains the results from executing the TestBytecodeSequence for device 2
- perfData.csv: Contains the counters for the L1 cache and branch misses
The R scripts and data directory have the aggregated data from each directory above (obtained by the toCSV.py script) as a CSV file. However, two folders exist for the "results_testfunction" and "results_pi5" directories, one for each device in the experiment. The data for these are contained in one file, results_testfunction.csv for the "results_testfunction" directories and results.csv for the "results_pi5" directories.
Checksums
Data and code.zip:
SHA256: 56caf51be85b0d5f509c15856771246c7f9028069955d38087ba5c9d1ed839f8
md5: 5ea72d397bceb69744aa8d36480d574e
Debian.ova:
SHA256: a1ab335b5a354ae09c3dd172195bdaf2040cefb05bac91d93884073651b560a2
md5: 77caf3d07dd2cc110f7499905e4f5c2c