Software Open Access
Bastian, Théophile; Kell, Stephen; Zappa Nardelli, Francesco
This VM contains the tools to reproduce the experiments of the submission:
Reliable and Fast DWARF-based Stack Unwinding
Since reviewers asked for a more complete evaluation along the lines
proposed in the author response, we include a draft of the revised
paper, reporting complete evaluation data. The text below refers to
the revised paper.
Open a terminal and execute:
$ cd ~/DWARF
$ make test-pager
A list of test options is displayed by running:
What is tested, and how to interpret the output
The test suite follows closely the three-part structure of the paper.
Section 2.1, validation
Section 2.1 describes a tool that performs dynamic validation of DWARF
stack-unwind instructions along one execution path.
The tool is tested on a binary with correct DWARF (good.bin) and a test
with hand-crafted incorrect DWARF (bad.bin). The tool is also tested
on the binary miscompiled by clang (discovered by fuzzing clang with
the csmith and creduce tools) reported in Figure 3.
The tool should report that good.bin has a correct .eh_frame table,
while it should find errors when run on bad.bin and llvm-bug.bin
For further manual testing, and to evaluate reusability, the validator
tool can be invoked on a binary <file.bin> with the command:
$ run_validator [path/to/binary.bin]
eg. to run one of the tests of the testsuite,
$ run_validator ~/DWARF/validation/testsuite/good/good.bin
Section 2.2, synthesis
Section 2.2 describes a tool that performs synthesis of eh_frame
tables from binaries.
1) The tool is tested on 600 CSmith generated programs, divided in 6
groups of 100 tests each. Each group is compiled either with gcc or
clang, at an optimisation level among -O0, -O1, and -O2. For each
binary, the synthesised table is then compared against the compiler
For binaries compiled with gcc, all 300 tests succeed.
For binaries compiled with clang, 295 tests succeeds. The five
remaining tests are expected to fail, as described in Section 2.2.
These have been manually investigated, and the failures are due to one
of the following:
- [ case identified by the "dead code" tag in the output ]
clang may generate dead code. Our synthesis algorithm cannot
synthesise unwind tables for dead code because the forward dataflow
analysis does not have initial values to propagate. This is not an
issue as unwinding will never called for a dead IP.
- [ identified by the "no possible valid DWARF" tag in the output ]
When handling abort paths, clang may generate code for which no
correct eh_frame entry exists. This is arguably a bug in clang
linearisation algorithm. We will report to clang developers.
- [ identified by the "clang generated invalid unwinding data" tag ]
Our tool synthesises a correct unwind table, but clang generated one
is incorrect. We will report to clang developers.
The present virtual machine is shipped with pre-synthesized unwind
tables, because the generation of those 600 tables takes a few minutes.
These, however, have been synthesized with the version of the tool
shipped in this virtual machine.
Re-generation of those tables can be forced by first cleaning them:
$ make -C ~/DWARF/synthesis/testsuite/csmith/ clean-synthesis
then running the synthesis test again with:
$ make synthesis-test-pager
2) The tool is tested on a common Unix tool, gzip.
Synthesis is invoked a gzip (v1.10) binary compiled statically with
the default shipped compilation options. The size of the binary is
1.2 MB, and the size of the .text section is 698 KB.
Synthesis is performed correctly for the whole binary, including the
statically linked subset of glibc, except for the function
"x2nrealloc". This is due to a bug in BAP: a function call to
"xalloc_die" never returns, but BAP CFG construction adds a
fall-through branch to some other address. As a result, a merge
error is generated where no merge is actually expected. This bug
has been reported to BAP developers.
For further manual testing, and to evaluate reusability, the synthesis
tool can be invoked on a binary <file.bin> with the command:
$ run_synthesis <file.bin> <output_file.bin>
The following command strips a binary of all its unwinding data:
$ strip_unwinding_data <file.bin> <output_file.bin>
The synthesised tables can be manually inspected with the command:
$ readelf -wF <file.bin>
To invoke the checker, suppose that two versions of the binary are
available: one, called <file.orig.bin>, with unwinding tables
considered valid (such as those generated by the compiler) and
another, called <file.eh.bin>, with the synthesized tables.
The checker can then be invoked with:
$ run_synthesis_checker <path/to/file>
where <path/to/file> is the path _without_ suffixes. For instance, the
command below runs the checker on one of the CSmith cases:
$ run_synthesis_checker ~/DWARF/synthesis/testsuite/csmith/gcc_O2/01
Section 3, speed-up
To evaluate the unwind speed-up obtained by precompiling DWARF unwind
table, we consider the "gzip" and "hackbench" binaries. Tables for
these two binaries (and all the libraries they depend on) are
precompiled. We measure the time spent by the perf profiler to unwind
the stack while profiling these two binaries, and compare it with the
time needed if the standard DWARF based libunwind is used.
Observed speedup on this VM Image running on a MacBook Pro is ~13x for
"perf gzip", and ~23x for "perf hackbench". These numbers may vary
depending on the hardware used to reproduce the test.
We also report statistics on the calls have been performed to the
function that unwinds each stack frame, as described in Section 3 of
The totals coincide on gzip, but differ slightly on hackbench because
hackbench uses threads and our unwinder does not unwind below a
pthread_exit function (while libunwind does). The "fail to unwind
errors" are due to truncated stack frames, and eh_elf and standard
libunwind report the same errors. The line "fallback to DWARF"
reports the cases where DWARF used complex expressions that are not
supported by eh_elfs (in particular in PLT tables); in these cases our
unwinder fallbacks to the standard one to interpret the complex
expressions. The "fallback to libunwind heuristics" cover cases where
no unwind information is available, and libunwind relies on heuristics
For further manual testing, and to evaluate reusability,
precompilation of an unwind table can be invoked with the command:
$ generate_eh_elfs <file.bin> <eh_elfs output directory>
This generates the eh_elf files for <file.bin> and the binaries against
which it is dynamically linked in the specified output directory. Those
files are used by libunwind-eh_elf, our modified version of libunwind
that supports precompiled tables.
Then, to run `perf` on <file.bin>, select an environment:
$ source ~/DWARF/speedup/dwarf-assembly/env/apply <flavour> <debug>
where <flavour> is either
* "eh_elf": our modified, sped-up version;
* "vanilla": standard libunwind;
* "vanilla-nocache": same as "vanilla" with caching mechanisms disabled
and <debug> is either
* "release": version with debugging capabilities removed for better
* "dbg": version with debugging capabilities enabled.
Then, capture some perf data on a run of <file.bin>:
$ perf record --call-graph dwarf,4096 <file.bin> <arguments...>
Note that the generated <perf.data> file can be used with any flavour of
libunwind afterwards: there is no need to re-capture data when switching
Performance readings for the unwinding can then be obtained by running:
$ LD_LIBRARY_PATH="<eh_elfs output directory>:$LD_LIBRARY_PATH" \
perf report 2>&1 >/dev/null