Published September 12, 2020 | Version 2020-01-17
Software Open

Software Artefact for the OOPSLA'20 Paper Titled "How Do Programmers Use Unsafe Rust?"

  • 1. ETH Zurich
  • 2. University of British Columbia

Description

Abstract

Rust’s ownership type system enforces a strict discipline on how memory locations are accessed and shared.
This discipline allows the compiler to statically prevent memory errors, data races, inadvertent side effects
through aliasing, and other errors that frequently occur in conventional imperative programs. However, the
restrictions imposed by Rust’s type system make it difficult or impossible to implement certain designs, such
as data structures that require aliasing (e.g., doubly-linked lists and shared caches). To work around this
limitation, Rust allows code blocks to be declared as unsafe and thereby exempted from certain restrictions of
the type system, for instance, to manipulate C-style raw pointers. Ensuring the safety of unsafe code is the
responsibility of the programmer. However, an important assumption of the Rust language, which we dub the
Rust hypothesis, is that programmers use Rust by following three main principles: use unsafe code sparingly,
make it easy to review, and hide it behind a safe abstraction such that client code can be written in safe Rust.


Understanding how Rust programmers use unsafe code and, in particular, whether the Rust hypothesis
holds is essential for Rust developers and testers, language and library designers, as well as tool developers.
This paper studies empirically how unsafe code is used in practice by analysing a large corpus of Rust projects
to assess the validity of the Rust hypothesis and to classify the purpose of unsafe code. We identify queries
that can be answered by automatically inspecting the program’s source code, its intermediate representation
MIR, as well as type information provided by the Rust compiler; we complement the results by manual
code inspection. Our study supports the Rust hypothesis partially: While most unsafe code is simple and
well-encapsulated, unsafe features are used extensively, especially for interoperability with other languages.

Artefact

This artefact contains both a virtual machine with a framework we used to do the analysis called Qrates and the data itself. You can find the instructions in the README.md file.

If you are interested in building on top of our research results, you can find the latest version of Qrates in our GitHub repository: https://github.com/rust-corpus/qrates/.

 

Files

licenses.csv

Files (47.4 GB)

Name Size Download all
md5:7d59ce8e919a7e6d13b62207bdefa282
465 Bytes Download
md5:e26b5726983c3dad10336597ee7ef956
6.8 MB Preview Download
md5:2cc766e0e4df1affc68a1cb5d96922fb
1.5 kB Preview Download
md5:cfa990db1b4eb9b62883bcea2aeca090
2.7 GB Download
md5:973258248c27df38eee58d0022dd6ea7
12.8 GB Download
md5:0dd23b30261d6b5c29b1d371294c7688
32.0 GB Download

Additional details

Funding

Swiss National Science Foundation
From Type Capabilities to Permissions for Program Verification (and back again) 200021_169503