Bastian, Théophile
Kell, Stephen
Zappa Nardelli, Francesco
2019-08-16
<p>This VM contains the tools to reproduce the experiments of the submission:</p>
<p> Reliable and Fast DWARF-based Stack Unwinding</p>
<p>Since reviewers asked for a more complete evaluation along the lines<br>
proposed in the author response, we include a draft of the revised<br>
paper, reporting complete evaluation data. The text below refers to<br>
the revised paper.</p>
<p>Quick start<br>
-----------</p>
<p>Open a terminal and execute:</p>
<p>$ cd ~/DWARF<br>
$ make test-pager</p>
<p>A list of test options is displayed by running:</p>
<p>$ make</p>
<p>What is tested, and how to interpret the output<br>
-----------------------------------------------</p>
<p>The test suite follows closely the three-part structure of the paper.</p>
<p>Section 2.1, validation<br>
***********************</p>
<p>Section 2.1 describes a tool that performs dynamic validation of DWARF<br>
stack-unwind instructions along one execution path.</p>
<p>The tool is tested on a binary with correct DWARF (good.bin) and a test<br>
with hand-crafted incorrect DWARF (bad.bin). The tool is also tested<br>
on the binary miscompiled by clang (discovered by fuzzing clang with<br>
the csmith and creduce tools) reported in Figure 3.</p>
<p>The tool should report that good.bin has a correct .eh_frame table,<br>
while it should find errors when run on bad.bin and llvm-bug.bin<br>
binaries.</p>
<p><br>
For further manual testing, and to evaluate reusability, the validator<br>
tool can be invoked on a binary <file.bin> with the command:</p>
<p>$ run_validator [path/to/binary.bin]</p>
<p>eg. to run one of the tests of the testsuite,</p>
<p>$ run_validator ~/DWARF/validation/testsuite/good/good.bin</p>
<p><br>
Section 2.2, synthesis<br>
**********************</p>
<p>Section 2.2 describes a tool that performs synthesis of eh_frame<br>
tables from binaries.</p>
<p>1) The tool is tested on 600 CSmith generated programs, divided in 6<br>
groups of 100 tests each. Each group is compiled either with gcc or<br>
clang, at an optimisation level among -O0, -O1, and -O2. For each<br>
binary, the synthesised table is then compared against the compiler<br>
generated one.</p>
<p>For binaries compiled with gcc, all 300 tests succeed.</p>
<p>For binaries compiled with clang, 295 tests succeeds. The five<br>
remaining tests are expected to fail, as described in Section 2.2.<br>
These have been manually investigated, and the failures are due to one<br>
of the following:</p>
<p>- [ case identified by the "dead code" tag in the output ]</p>
<p> clang may generate dead code. Our synthesis algorithm cannot<br>
synthesise unwind tables for dead code because the forward dataflow<br>
analysis does not have initial values to propagate. This is not an<br>
issue as unwinding will never called for a dead IP.</p>
<p>- [ identified by the "no possible valid DWARF" tag in the output ]</p>
<p> When handling abort paths, clang may generate code for which no<br>
correct eh_frame entry exists. This is arguably a bug in clang<br>
linearisation algorithm. We will report to clang developers.</p>
<p>- [ identified by the "clang generated invalid unwinding data" tag ]</p>
<p> Our tool synthesises a correct unwind table, but clang generated one<br>
is incorrect. We will report to clang developers.</p>
<p>The present virtual machine is shipped with pre-synthesized unwind<br>
tables, because the generation of those 600 tables takes a few minutes.<br>
These, however, have been synthesized with the version of the tool<br>
shipped in this virtual machine.</p>
<p>Re-generation of those tables can be forced by first cleaning them:</p>
<p>$ make -C ~/DWARF/synthesis/testsuite/csmith/ clean-synthesis</p>
<p>then running the synthesis test again with:</p>
<p>$ make synthesis-test-pager</p>
<p>2) The tool is tested on a common Unix tool, gzip.</p>
<p> Synthesis is invoked a gzip (v1.10) binary compiled statically with<br>
the default shipped compilation options. The size of the binary is<br>
1.2 MB, and the size of the .text section is 698 KB.</p>
<p> Synthesis is performed correctly for the whole binary, including the<br>
statically linked subset of glibc, except for the function<br>
"x2nrealloc". This is due to a bug in BAP: a function call to<br>
"xalloc_die" never returns, but BAP CFG construction adds a<br>
fall-through branch to some other address. As a result, a merge<br>
error is generated where no merge is actually expected. This bug<br>
has been reported to BAP developers.</p>
<p><br>
For further manual testing, and to evaluate reusability, the synthesis<br>
tool can be invoked on a binary <file.bin> with the command:</p>
<p>$ run_synthesis <file.bin> <output_file.bin></p>
<p>The following command strips a binary of all its unwinding data:</p>
<p>$ strip_unwinding_data <file.bin> <output_file.bin></p>
<p>The synthesised tables can be manually inspected with the command:</p>
<p>$ readelf -wF <file.bin></p>
<p>To invoke the checker, suppose that two versions of the binary are<br>
available: one, called <file.orig.bin>, with unwinding tables<br>
considered valid (such as those generated by the compiler) and<br>
another, called <file.eh.bin>, with the synthesized tables.</p>
<p>The checker can then be invoked with:</p>
<p>$ run_synthesis_checker <path/to/file></p>
<p>where <path/to/file> is the path _without_ suffixes. For instance, the<br>
command below runs the checker on one of the CSmith cases:</p>
<p>$ run_synthesis_checker ~/DWARF/synthesis/testsuite/csmith/gcc_O2/01</p>
<p><br>
Section 3, speed-up<br>
*******************</p>
<p>To evaluate the unwind speed-up obtained by precompiling DWARF unwind<br>
table, we consider the "gzip" and "hackbench" binaries. Tables for<br>
these two binaries (and all the libraries they depend on) are<br>
precompiled. We measure the time spent by the perf profiler to unwind<br>
the stack while profiling these two binaries, and compare it with the<br>
time needed if the standard DWARF based libunwind is used.</p>
<p>Observed speedup on this VM Image running on a MacBook Pro is ~13x for<br>
"perf gzip", and ~23x for "perf hackbench". These numbers may vary<br>
depending on the hardware used to reproduce the test.</p>
<p><br>
We also report statistics on the calls have been performed to the<br>
function that unwinds each stack frame, as described in Section 3 of<br>
the paper.</p>
<p>The totals coincide on gzip, but differ slightly on hackbench because<br>
hackbench uses threads and our unwinder does not unwind below a<br>
pthread_exit function (while libunwind does). The "fail to unwind<br>
errors" are due to truncated stack frames, and eh_elf and standard<br>
libunwind report the same errors. The line "fallback to DWARF"<br>
reports the cases where DWARF used complex expressions that are not<br>
supported by eh_elfs (in particular in PLT tables); in these cases our<br>
unwinder fallbacks to the standard one to interpret the complex<br>
expressions. The "fallback to libunwind heuristics" cover cases where<br>
no unwind information is available, and libunwind relies on heuristics<br>
to proceed.</p>
<p><br>
For further manual testing, and to evaluate reusability,<br>
precompilation of an unwind table can be invoked with the command:</p>
<p>$ generate_eh_elfs <file.bin> <eh_elfs output directory></p>
<p>This generates the eh_elf files for <file.bin> and the binaries against<br>
which it is dynamically linked in the specified output directory. Those<br>
files are used by libunwind-eh_elf, our modified version of libunwind<br>
that supports precompiled tables.</p>
<p>Then, to run `perf` on <file.bin>, select an environment:</p>
<p>$ source ~/DWARF/speedup/dwarf-assembly/env/apply <flavour> <debug></p>
<p>where <flavour> is either<br>
* "eh_elf": our modified, sped-up version;<br>
* "vanilla": standard libunwind;<br>
* "vanilla-nocache": same as "vanilla" with caching mechanisms disabled</p>
<p>and <debug> is either<br>
* "release": version with debugging capabilities removed for better<br>
performance;<br>
* "dbg": version with debugging capabilities enabled.</p>
<p>Then, capture some perf data on a run of <file.bin>:</p>
<p>$ perf record --call-graph dwarf,4096 <file.bin> <arguments...></p>
<p>Note that the generated <perf.data> file can be used with any flavour of<br>
libunwind afterwards: there is no need to re-capture data when switching<br>
environment.</p>
<p>Performance readings for the unwinding can then be obtained by running:</p>
<p>$ LD_LIBRARY_PATH="<eh_elfs output directory>:$LD_LIBRARY_PATH" \<br>
perf report 2>&1 >/dev/null</p>
https://doi.org/10.5281/zenodo.3369915
oai:zenodo.org:3369915
eng
Zenodo
https://doi.org/10.5281/zenodo.3369914
info:eu-repo/semantics/openAccess
BSD 3-Clause Clear License
http://labs.metacarta.com/license-explanation.html#license
Compilers, Debug information
Reliable and Fast DWARF-based Unwinding (Artifact)
info:eu-repo/semantics/other