These instructions are intended for using the artifact for our accepted OOPSLA'20 paper entitled "Precise Static Modeling of Ethereum ``Memory''.
Our artifact is bundled for AMD64 Linux using an Ubuntu 18.04 Docker image. If you do not already have Docker installed, follow the official installation instructions.
Decompress our Docker image:
bunzip2 eth-memory-modeling.tar.bz2
docker load -i eth-memory-modeling.tar
/home/reviewer/gigahorse-toolchain
. Except where specified,
the remainder of the commands in this README should be run in this Docker
shell with this path as the working directory. docker run -it eth-memory-modeling
This README is also available in markdown form at ~/README.md
.
To facilitate testing, a random subset of 2K unique smart contract bytecodes
is bundled with the Docker image in the directory ~/contracts_2k_randoms
.
Each file in ~/contracts_2k_random
contains the ASCII hexadecimal
representation of one EVM bytecode program (i.e., smart contract).
The examples in this README operate on this bundled subset of contracts. They may be used to approximate the results of our full experiments.
Alternatively, all 70,063 contracts of our dataset are contained in ~/contracts-all.tar.bz2
.
Running the following commands will place our full evaluation set in ~/contracts_all
:
cd ~
bunzip2 ~/contracts-all.tar.bz2
tar -xvf contracts-all.tar
Running our analysis consists of three stages:
In order to run our memory modeling analysis and its clients on a single
contract a helper script is provided under ~/gigahorse-toolchain/runSingle.sh
.
It should be noted that the commands in the ~/gigahorse-toolchain/runSingle.sh
script invokes
the interpreted mode of souffle, which runs significantly slower than the
compiled mode which the bulk analysis (used when evaluating our analysis) uses.
Running ~/runSingle.sh
on a sample contract (ex: ~/contracts_2k_random/0440797b18a56d76a93f4d4059765f38.hex
)
will result in the execution of the following commands:
1 cd ~/gigahorse-toolchain/
2 ./generatefacts ~/contracts_2k_random/0440797b18a56d76a93f4d4059765f38.hex decomp-facts
3 \rm -rf decomp-out
4 mkdir decomp-out && cd decomp-out
5 souffle -F ../decomp-facts ../logic/decompiler.dl
6 souffle ../../clients/function_inliner.dl
7 souffle ../../clients/ethainter-new.dl
8 souffle ../../clients/repeatedcalls.dl
9 souffle ../../clients/eip1884impact.dl
The command at line 2 translates the input contract bytecode into a set of Souffle
fact files in the directory ./decomp-facts
.
The command at line 5 runs the Gigahorse decompiler, loading input fact files
from ./decomp-facts
. It produces a set of output relations as CSV files in
./decomp-out
.
The next command runs the function inliner, transforming the analysis facts in a non reversible manner by performing 2x function inlining.
The last 3 commands run the 3 memory modeling clients described in section 5 of our paper.
Each client produces output relations to disk, as CSV files:
An entry in one of these relations corresponds to one flagged occurrence of that vulnerability/bad smell in the input contract.
To perform a batch decompilation and analysis of all 2K example contracts,
Gigahorse provides a convenience script called bulk_analyze.py
, which can be run
as follows:
$ python3.8 bulk_analyze.py -j 8 -d ~/contracts_2k_random -C ../clients/function_inliner.dl,../clients/memory_modeling.dl
Here, we have selected to run 8 parallel analysis processes to process all
bytecode files in ~/contracts
. After decompilation, function inling will
be performed, followed by the core memory modeling analysis, for each contract.
The analysis script will print progress to stdout, and print an aggregate
summary of results upon completion, with more detailed per-contract results
written to the file results.json
.
An example of the summary of the results mentioned earlier, indicating successful execution of the previous command is the following:
...
1994: b2509f5c1ed4f9bac31ae80cf01e6e07.hex completed in 0.07 + 0.95 + 0.12 secs
1998: 55138f5c77517e4ab9a8385e1a0ad1a9.hex completed in 0.03 + 0.43 + 0.06 secs
1999: da5836e2c0dda0dcb3162529646b95be.hex completed in 0.02 + 0.45 + 0.06 secs
ad2e0e312a3c9b11d3ac7ffdf8cba162.hex timed out.
6eb5c177c74262b721685f6893eff290.hex timed out.
7faccca439d5e5f3d8cf6bbfa14c4fb3.hex timed out.
aa155a2dc6961f896023bc27be867a96.hex timed out.
Finishing...
2000 of 2000 contracts flagged.
ActualReturnArgs: 71.30%
AllCALLsClassified: 98.20%
...
TAC_Var: 99.10%
TAC_Variable_BlockValue: 99.10%
TAC_Variable_Value: 99.10%
TIMEOUT: 0.90%
Verbatim_AllVsModeledMLOADs: 99.10%
Verbatim_AllVsModeledMSTOREs: 99.10%
Verbatim_CDLAllVSStaticVSArr: 99.10%
Verbatim_MemConsStmtsLengths: 99.10%
assertVarIsArray: 2.20%
bytecode: 99.10%
inliner: 832.55%
inliner2: 485.75%
preTrans: 0.25%
The percentages next to the relation names indicate the percentage of contracts that the rule produced some results for (output CSV files were non-empty).
We ran the artifact on two different setups:
~/contracts_all
using 24 concurrent processes.~/contracts_2k_random
using 8 concurrent processes.We perform bulk analysis in two stages in order to seperate the fact generation performed using the gigahorse decompiler, from the execution of our memory-modeling-based client analyses, and inspect their running times and timeouts individually.
python3.8 bulk_analyze.py -j 24 -d ~/contracts_all -C ../clients/function_inliner.dl --restart
python3.8 bulk_analyze.py -j 24 -d ~/contracts_all -C ../clients/ethainter-new.dl,../clients/eip1884impact.dl,../clients/repeatedcalls.dl --rerun_clients
python3.8 bulk_analyze.py -j 8 -d ~/contracts_2k_random -C ../clients/function_inliner.dl --restart
python3.8 bulk_analyze.py -j 8 -d ~/contracts_2k_random -C ../clients/ethainter-new.dl,../clients/eip1884impact.dl,../clients/repeatedcalls.dl --rerun_clients
After running the two stages of the bulk analysis, file results.json
will have detailed per-contract results.
The following table contains the running times of our two setups for the two stages of the bulk_analysis.
Setup 1 | Setup 2 | |
---|---|---|
decompilation | 109 mins | 13.5 mins |
client analyses | 20 mins | 4.5 mins |
Executing the commands above and inspecting the running times supports the claims made in the Analysis Scalability paragraph in Section 6.1.
After performing the bulk analysis of the memory-modeling-based clients, their results can be inspected at the summary printed by bulk_analyze.py
at the end of its execution.
The claims regarding the number or percentage of contracts that report at least one result for the respective analyses can be found at:
Vulnerability_TaintedERC20Transfer
): Section 6.2.1 (lines 922-923)FallbackWillFail
): Section 6.3 (lines 953-954)
Note that in the originally submitted version of the paper the number of smart contracts reported refers to smart contract instances and not unique bytecodes like the statistics provided at the rest of the paper. We will fix this inconsistency in the revised version of the paper.
In terms of unique bytecodes the expected number is 195 (0.27%) when analysing our complete dataset, and 3 (0.15%) when the subset of 2k contracts are analyzed.RepeatedCalls
): Section 6.4 (line 995)Keep in mind that, when running on the provided subset of contracts, these numbers will be approximates of what is reported in the paper.
A helper script ~/print-metrics.py
is provided, parsing the JSON output file of bulk_analyze.py
and printing the metrics presented in Section 6.1, populating table 1 and figures 4 and 5. It can be invoked as:
python3.8 ~/print-metrics.py results.json
transfer
Apart from the statistics about the results of the Vulnerability_TaintedERC20Transfer
relation we mentioned earlier, we also provide the contract sources and bytecodes of the manually inspected contracts (figure 6) are available under ~/manual-inspection/TaintedERC20Transfer
.
In order to compare different implementations of ethainter on the same set of contracts, script ~/infoflow-comparison.py
accepts the JSON outputs of the two implementations, and prints their differences regarding the supported ethainter vulnerabilities (contracts flagged by one and not the other).
First, the old ethainter client (without our memory modeling) needs to be used to analyse our evaluation set.
~/clients/ethainter-old-inlined.dl
): python3.8 bulk_analyze.py -j 24 -d ~/contracts_all -C ../clients/ethainter-old-inlined.dl --rerun_clients -r ethainter-old.json
python3.8 bulk_analyze.py -j 8 -d ~/contracts_2k_random -C ../clients/ethainter-old-inlined.dl --rerun_clients -r ethainter-old.json
On startup of the bulk analysis script a lot of warnings will be reported by souffle. They do not affect the results of the analysis and can be ignored.
~/infoflow-comparison.py
: python3.8 ~/infoflow-comparison.py results.json ethainter-old.json
The results of the FallbackRetracer tool cannot be reproduced because they require access to an archival Ethereum node which takes up over 4.5 TBs of SSD storage and has a sync time of around a month.
The contract sources and bytecodes of the manually inspected contracts (figure 8) are available under ~/manual-inspection/RepeatedCallsOurs
and ~/manual-inspection/RepeatedCallsSecurify
.
As this experiment relies on manual inspection, we run Securify on sources to get source mappings. However, this complicates the workflow significantly:
This critical metadata was acquired manually by the human agents carrying out the experiment, who installed the correct Solidity compiler version and configured Securify to use it for its analysis, on a per-contract basis. The results were also filtered, to only include warnings for code that is in the "main" contract or any super-contract.
Due to these conditions we will not be able to provide a mechanism to automatically replicate the experiment results.