lu-cs-sde/IntraJSCAM2021: IEEESCAM2021
Description
This repository contains the artifact for the following paper:
- A Precise Framework for Source-Level Control-Flow Analysis, Idriss Riouak π, Christoph Reichenbach π, Görel Hedin π and Niklas Fors π. To appear at the 21st IEEE International Working Conference on Source Code Analysis & Manipulation, 2021 (SCAM 2021 π).
The repository contains:
- A snapshot of IntraCFG (192c02c), i.e., a language-independent framework for building control-flow graphs (CFGs) using reference attribute grammars, as described in section 2 in the paper.
- A snapshot of IntraJ (479e927), i.e., a tool that applies IntraCFG to construct control-flow graphs (CFGs) for Java source programs, as described in section 3 in the paper. IntraJ is built as an extension to the ExtendJ Java compiler, which is in turn implemented using JastAdd (a metacompiler supporting reference attribute grammars).
- In addition to CFG construction, IntraJ contains two example client analyses that make use of the CFG, as described in section 4 in the paper: DAA - detection of Dead Assignments in the codebase, and NPA - detection of occurrences of Null Pointer exceptions.
- Test cases and evaluation scripts are also included. Using this artifact you can rerun the experiments presented in section 5 in the paper. This includes running IntraJ on a suite of subject codebases, and running competing tools (JastAddJ-Intraflow and SonarQube) on the same codebases.
You can reuse this artifact in various ways. For example:
- You can run IntraJ on other Java codebases (in Java-4, Java-5, Java-6, and Java-7) in order to construct CFGs and get DAA and NPA analysis results. More can be read about reusability here.
Get the IntraJ artifact
We provide three different ways of getting and running IntraJ:
- You can download the pre-built Docker image (recommended).
- Build your own Docker image using the Dockerfile script.
- Download and build IntraJ from the artifact source code.
Docker
We provide a Docker image that contains IntraJ and evaluation scripts, packaged together with all the necessary dependencies. To run such an image, make sure to install the relevant tools:
-
For Windows and OS X systems, follow the guidelines on the Docker desktop download site
-
On Linux-based systems, install the docker command-line tool. This tool may be provided by the docker.io and/or docker-ce packages. If your distribution does not provide these packages, follow the steps here:
Download pre-built Docker image
Download the pre-built image here. Then, anywhere in your workspace run
docker load << Downloads/intraj_scam21.tar.gz
Build your own Docker image
Clone the IntraJSCAM2021 repository by running the following command:
git clone https://github.com/lu-cs-sde/IntraJSCAM2021.git
Once you have cloned the repository
cd IntraJSCAM2021/Docker
docker build -t intraj:scam21 .
β οΈ Note: It might take several minutes to build the Docker image. Run the image
Run the image using:
docker run -it --network="host" --expose 9000 --expose 9001 --memory="10g" --memory-swap="16g" intraj:scam21
βοΈ Very Important βοΈ: SonarQube requires a high amount of memory. We tested the container with 10GByte of memory and 10GByte of swap memory. If you are running the container from Windows or Mac, the command-line options related to the available memory in the container (i.e., --memory="10g"
, --memory-swap="4g"
) are ignored. Please, set these two parameters from the GUI. Read more about it here: Windows - Mac
You will be logged in with the user SCAM21. Once logged in, run the following commands to launch the evaluation:
cd workspace/intraj/
./eval.sh 50 50
The results are saved in: ~/workspace/intraj/evaluation/YYYYMMDD_HHMMSS
β οΈ Note: The command eval.sh 50 50
will run IntraJ 2500 times for each analysis. Therefore, computing the evaluation can take several hours.
βοΈ Very Important βοΈ: Do not close the bash nor kill the container! The results will be lost! Saving the results
To save the results in your own machine, run the following commands in a new bash:
> docker ps
This will print:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4d882c86b5ab intraj:scam21 "bash" x Up x seconds random_name
With your CONTAINER ID run the following command:
docker cp 4d882c86b5ab:workspace/intraj/evaluation/YYYYMMDD_HHMMSS /PATH/IN/YOUR/MACHINE
Build IntraJ from the source code Prerequisites
We have run IntraJ on the following Java version:
- Java SDK version 7. (tested with SDK 7.0.292-zulu. See sdkman).
If you also want to run the competing tool SonarQube (for checking the evaluation section), you will additionally need the following Java version:
- Java SDK version 11 (tested with SDK 11.0.9.fx-zulu. See sdkman).
It is possible to generate PDFs that show the CFGs visually. For this you need: 1) Dot (graphiz) - PDF generation 2) Vim - PDF generation 3) Python3.x with the following dependencies:
* **PyPDF2 v1.26.0** - _PDF generation_
* **numpy v1.20.1** - _Evaluation and Plots generation_
* **pandas v1.2.4** - _Evaluation and Plots generation_
* **matplotlib v3.3.4** - _Evaluation and Plots generation_
* **seaborn v0.11.1** - _Evaluation and Plots generation_
* **ipython v7.26.0** - _Evaluation and Plots generation_
The evaluation script uses sdkman
. To run the evaluation you need:
-
The scripts
eval.sh
andevaluation/run_eval.sh
usessdkman
. If you don't havesdkman
installed but have Java SDK 7 installed, you can comment all the lines starting withsdk
ineval.sh
and inevaluation/run_eval.sh
. You installsdkman
by running the following commands:curl -s "https://get.sdkman.io" | bash source "$HOME/.sdkman/bin/sdkman-init.sh" sdk install java 7.0.292-zulu sdk use java 7.0.292-zulu
To install all the necessary Python dependencies, you can run the instruction described in the next section.
Build
To clone the IntraJ code, run, in your working directory:
git clone https://github.com/lu-cs-sde/IntraJSCAM2021.git
Move to the IntraJ directory:
cd IntraJSCAM2021
To generate all the JARs necessary for the evaluation, execute
./gradlew build
To run all the tests, execute:
./gradlew test
Python Dependencies
To install Python dependencies, you can execute the following instruction:
cd resources
pip3 install - requirements.txt
Repository overview
The top-level structure of the repository:
.
βββ build # Compiled files
βββ evaluation # Scripts and dependencies for evaluation
βββ extendj # ExtendJ source code
βββ resources # Scripts and logo
βββ src # IntraJ source code
| βββ jastadd
| | βββ CFG # CFG spec in Jastadd
| | βββ DataFlow # Data flow analyses spec
| βββ java
| βββ utils # General helpers for visualisation
| βββ test # JUnit test spec
βββ tools # IntraJ source code
| βββ jastadd-2.3.6-custom # Custom version of Jastadd
βββ testfiles # Automated test files
| βββ DataFlow
| βββ CFG
βββ eval.sh # Evaluation entry point
βββ LICENSE
βββ README.md
The entry point of IntraJ (main) is defined in: extendj/src/fronted-main/org/extendj/IntraJ.java
.
The evaluation folder
The directory is structured as follow:
.
βββ antlr-2.7.2 # ANTLR Benchmark (Paper Β§5)
βββ pmd-4.2.5 # PMD Benchmark (Paper Β§5)
βββ jfreechar-1.0.0 # JFC Benchmark (Paper Β§5)
βββ fop-0.95 # FOP Benchmark (Paper Β§5)
βββ Results.xlsx # Analyses results in Excel (Paper Β§5)
βββ Results.htm # Analyses results in HTML
βββ plots.py # Script that generates plots
βββ run_eval.sh # Called by ../eval.sh
βββ YYYYMMDD_HHMMSS # Evaluation results
The jastadd folder
.
βββ jastadd
βββ CFG
| βββ IntraCFG
| | βββ CFG.ast # Lang-independent nodes
| | βββ IntraCFG.jrag # IntraCFG spec in Jastadd (Paper Β§2.b)
| βββ java4 # (Paper Β§3)
| | βββ Cache.jrag # Cache settings
| | βββ Exception.jrag # Exception spec (Paper Β§3.c)
| | βββ Initializer.jrag # Initializers spec (Paper Β§3.b)
| | βββ Java4.jrag # Java4 spec
| | βββ ImplictNodes.ast # Reified nodes
| βββ java5 # (Paper Β§3)
| | βββ Java5.jrag # Java5 spec
| βββ java7 # (Paper Β§3)
| βββ Java7.jrag # Java7 spec
βββ DataFlow # Data flow analyses spec (Paper Β§4)
βββ Analysis.jrag # Collection attributes
βββ DeadAssignment.jrag # DAA spec (Paper Β§4.c)
βββ LiveVariableAnalysis.jrag # LVA spec (Paper Β§4.b)
βββ NullAnalysis.jrag # NPE sepc (Paper Β§4.a)
β οΈ Note: There is no subdirectory for java6
, since features introduced in Java 6 do not affect the construction of the CFG.
Available options to IntraJ:
-help
: prints all the available options.-genpdf
: generates a pdf with AST structure of all the methods in the analysed files. It can be used combined with-succ
,-pred
.-succ
: generates a pdf with the successor relation for all the methods in the analysed files. It can be used combined with-pred
.-pred
: generates a pdf with the predecessor relation for all the methods in the analysed files. It can be used combined with-succ
.-statistics
: prints the number of CFGRoots, CFGNodes and CFGEdges in the analysed files.-nowarn
: the warning messages are not printed.
-------------- ANALYSIS OPTIONS --------------------
Available analyses:
DAA
: Detects unused dead assignmentsNPA
: Detects occurrences of Null Pointer Dereferencing
Options (where id
corresponds to one of the analyses above):
-Wid
: enable a given analysis, e.g.,-WDAA
-Wall
: enables all the available analyses-Wexcept=id
: enable all the available analyses exceptid
, e.g.,-Wexcept=DAA
Example of running IntraJ
Suppose you would like to analyze a file Example.java
located in your workspace:
public class Example {
int example() {
Integer m = null;
m.toString();
int x = 0;
x = 1;
return x;
}
}
By running the following command:
java -jar intraj.jar PATH/TO/Example.java -Wall -succ -statistics
IntraJ will print the following information
[NPA - PATH/TO/Example.java:4,4] The object 'm' may be null at this point.
[DAA - PATH/TO/Example.java:5,9] The value stored in 'x' is never read.
[INFO]: CFG rendering
[INFO]: DOT to PDF
[INFO]: PDF file generated correctly
[STATISTIC]: Elapsed time (CFG + Dataflow): 0.11s
[STATISTIC]: Total number
[STATISTIC]: Number roots:3
[STATISTIC]: Number CFGNodes:16
[STATISTIC]: Number Edges:13
[STATISTIC]: Largest CFG in terms of nodes:12
[STATISTIC]: Largest CFG in terms of edges:11
And the following PDF is generated:
How to run the evaluation
1) Follow the instructions in "Prerequisites"
and "Build"
above. 2) Run the command ./gradlew build
. This generates the following jar files:
- intraj.jar
- intraj_bl.jar
- intraj_cfg.jar
- intraj_cfgdda.jar
- intraj_dda.jar
3) Start the evaluation by executing "zsh eval.sh N_iter_outerloop N_iter_innerloop"
. For the paper we used N_iter_outerloop = N_iter_innerloop = 50
.
All the results are stored in evaluation/YYYYMMDD_HHMM
.
Related repository repositories/links π
- π IntraJ: main repository for IntraJ (control-flow analysis for Java)
- π IntraCFG: main repository for IntraCFG (language-independent framework for control-flow analysis)
- π JastAdd: meta-compilation system that supports Reference Attribute Grammars. We used a custom JastAdd version which better supports interfaces.
- π ExtendJ: extensible Java compiler built using JastAdd. We built IntraJ as an Static Analysis Extension of ExtendJ. More can be found here.
- π SonarQube: platform developed by SonarSource for continuous inspection of code quality
- π JastAddJ-Intraflow: An earlier approach to implementing intra-procedural control flow, dataflow, and dead assignment analysis for Java, also using JastAdd.
Files
lu-cs-sde/IntraJSCAM2021-IEEESCAM2021.zip
Files
(1.2 GB)
Name | Size | Download all |
---|---|---|
md5:e0dafd53562251d771bad4a17d6f18aa
|
1.2 GB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/lu-cs-sde/IntraJSCAM2021/tree/IEEESCAM2021 (URL)