Static Analysis of Endian Portability by Abstract Interpretation

We present a static analysis of endian portability for C programs. Our analysis can infer that a given program, or two syntactically close versions thereof, compute the same outputs when run with the same inputs on platforms with different byte-orders, a.k.a. endiannesses. We target low-level C programs that abuse C pointers and unions, hence rely on implementation-specific behaviors undefined in the C standard. Our method is based on abstract interpretation, and parametric in the choice of a numerical abstract domain. We first present a novel concrete collecting semantics, relating the behaviors of two versions of a program, running on platforms with different endiannesses. We propose a joint memory abstraction, able to infer equivalence relations between littleand big-endian memories. We introduce a novel symbolic predicate domain to infer relations between individual bytes of the variables in the two programs, which has near-linear cost, and the right amount of relationality to express (bitwise) arithmetic properties relevant to endian portability. We implemented a prototype static analyzer, able to scale to large real-world industrial software, with zero false alarms.


Introduction
There is no consensus on the representation of a multi-byte scalar value in computer memory [9]. Some systems store the least-significant byte at the lowest address, while others do the opposite. The former are called little-endian, the latter big-endian. Such systems include processor architectures, network protocols and data storage formats. For instance, Intel processors are little-endian, while internet protocols and some legacy processors, such as SPARC, are big-endian. As a consequence, programs relying on assumptions on the encoding of scalar types may exhibit different behaviors when run on platforms with different byte-orders, a.k.a. endiannesses. The case occurs typically with low-level C software, such as device drivers or embedded software. Indeed, the C standard [18] leaves the encoding of scalar types partly unspecified. The precise representation of types is standardized in implementation-specific Application Binary Interfaces (ABI), such as [2], to ensure the interoperability of compiled programs, libraries, and operating systems. Although it is possible to write fully portable, ABI-neutral C code, the vast majority of C programs rely on assumptions on the ABI of the platform, such as endianness. Therefore, the typical approach used, when porting a low-level C program to a new platform with opposite endianness, is to eliminate most of the byte-order-dependent code, and to wrap the remainder, if any, in conditional inclusion directives, which results in two syntactically close endian-specific variants of the same program. A desirable property, which we call endian portability, is that a program computes the same outputs when run with the same inputs on the little-and big-endian platforms. By extension, we also say that a program is endian portable if two endian-specific variants thereof compute the same outputs when run with the same inputs on their respective platforms. In this paper, we describe a static analysis which aims at inferring the endian portability of large real-world low-level C programs.
Motivating example. For instance, Example 1 features a snippet of code for reading network input. The sequence of bytes read from the network is first stored into integer variable x. Assume variable y has the same type. x is then either copied, or byte-swapped into y, depending on the endianness of the platform. Our analysis is able to infer that Example 1 is endian portable, i.e. both endianspecific variants compute the same value for y, whatever the values of the bytes read from the network. This property is expressed by the assertion at line 8.
semantics developed for patch analysis [14,15]. Our symbolic predicate domain is based on previous work on predicate domains [25], and symbolic constant propagation [24]. Our domain is also reminiscent of the Slice domain introduced in [8,7] for another purpose, and implemented differently.
Contributions. The main contributions of this work are: -We present a novel concrete collecting semantics, relating the behaviors of two versions of a program, running on platforms with different endiannesses. -We propose a joint memory abstraction able to infer equivalence relations between little-and big-endian memories. -We introduce a novel symbolic predicate domain to infer relations between individual bytes of the variables in the two programs, which has near-linear cost, and the right amount of relationality to express (bitwise) arithmetic properties relevant to endian portability. -We implemented our analysis on the Mopsa [28,19] platform. Our prototype is able to scale to large real-world industrial software, with zero false alarms. The paper is organised as follows. Section 2 formalizes the concrete collecting semantics, Sect. 3 describes the memory abstraction, Sect. 4 describes the numerical abstraction and introduces a novel numeric domain, Sect. 5 presents experimental results with a prototype implementation. Section 6 concludes.

Syntax and concrete semantics
Following the standard approach to abstract interpretation [10], we develop a concrete collecting semantics for a C-like language for double programs. The operator may occur anywhere in the parse tree to denote syntactic differences between the left (little-endian) and right (big-endian) versions of a double program. However, operators cannot be nested: a double program only describes a pair of programs. Given double program P with variables in V, we call its left (resp. right) version P 1 = π 1 (P ) (resp. P 2 = π 2 (P )), where π 1 (resp. π 2 ) is a version extraction operator, defined by induction on the syntax, keeping only the left (resp. right) side of symbols. For instance, π 1 (x ← 1 y ← 0) = x ← 1, and π 2 (x ← 1 y ← 0) = y ← 0, while π 1 (z ← 0) = z ← 0 = π 2 (z ← 0).

Syntax
Simple programs P 1 and P 2 enjoy a standard, C-like syntax presented in Fig. 1. Statements stat are built on top of expressions expr and Boolean conditions cond. The syntax of double statements dstat includes specific assume_sync and assert_sync statements, used for specifications. The former is used to express assumptions on program inputs, while the latter is used to express assertions on program outputs: assume_sync(e) introduces the assumption that expression e evaluates to the same value in double program versions P 1 and P 2 , while assert_sync(e) checks that the value of e is identical in both versions, and fails otherwise. Expression [c 1 , c 2 ] chooses a value non-deterministically between scalar-type ::= int-sign int-type | ptr type ::= scalar-type | . . . int-sign ::= signed | unsigned int-type ::= char | short | int | long | long long Expressions rely on a C-like type-system. Integer and pointer types are collectively referred to as scalar types. Expressions support pointer arithmetic, expressed as byte-level offset arithmetic. All left-values are assumed to be preprocessed to dereferences * τ e (i.e. *((τ *)e) in C) where τ is a scalar type, and e is a pointer expression. Note that dereferences are limited to scalar types, and the dereferenced type is explicit in the syntax.

Semantics of low-level simple C programs
The semantics of simple programs is parameterized by an ABI. In this paper, we assume program versions have the same ABIs, but for endianness. Let A { L, B } denote the possible endiannesses (little-and big-endian). The sizes of types, in contrast, are the same for both program versions. We thus assume a unique function sizeof ∈ type → N given, which provides theses sizes (in bytes).
Pointer values are modeled as (semi-)symbolic addresses of the form V, i ∈ Addr V × Z, which indicate an offset of i bytes from the first byte of V . p, i ∈ B denotes the i−th byte in the memory representation of the pointer value p. Expressions manipulate scalar values, which may be numeric (machine integers) or pointer values. We denote the set of values as V Z ∪ Ptr. The definition of the most concrete semantics requires a family of representation functions benc τ,α ∈ V → P(B * ), that convert a scalar value of given type τ ∈ scalar-type and endianness α ∈ A into a sequence of sizeof (t) byte values. We denote as bdec τ,α ∈ B * → P(V) the converse operation. For instance, on a 32-bit Note that the benc τ,α and bdec τ,α functions are non-deterministic. For instance, reinterpreting a pointer value as an integer, as in bdec int,L • benc ptr,L (p), returns the full range of type int. We do not detail the definitions of these functions here, for the sake of conciseness. An example may be found in [26, sec. 5.2]. Environments are elements of E Addr → B. The semantics E expr ∈ A → E → P(V) and S stat ∈ A → P(E) → P(E) for simple expressions and statements is defined by standard induction on the syntax. We therefore only show, on Fig. 2, the semantics E * τ e α and S * τ e 1 ← e 2 α for memory reads and writes, given endianness α ∈ A. Bytes are fetched and decoded with bdec τ,α when reading from memory in expression * τ e, while values computed by expression e 2 are encoded into bytes with benc τ,α when writing to memory in assignment * τ e 1 ← e 2 . Note that illegal memory accesses are silently omitted to simplify the presentation.

Semantics of double programs
We now lift simple program semantics S to double program semantics D. As both simple program versions P k = π k (P ) have concrete states in E, the double program P has concrete states in D E×E. The semantics of P k is parameterized by its endianness α k ∈ A. We assume, without loss of generality, that P 1 is the little-endian version, and P 2 the big-endian one. D s ∈ P(D) → P(D) describes the relation between input and output states of s, which are pairs of states of simple programs. The definition for D s is shown on Fig. 3. D leverages previous work on patch analysis [14,15]. It is defined by induction on the syntax, so as to allow for a modular definition and joint analyses of double programs. Note that D is parametric in S.
The semantics for the empty program is the identity function. The semantics D s 1 s 2 for the composition of two syntactically different statements reverts to the pairing of the simple program semantics of individual simple statements s 1 and s 2 . The semantics for assignments is defined with this construct. The semantics of assume_sync and assert_sync statements filters away environments where the left and right versions of a double program may disagree on the value of expression e. The semantics for the sequential composition of statements boils down to the composition of the semantics of individual statements. The semantics for selection statements relies on the filter F e 1 0 e 2 0 to distinguish between cases where both versions agree on the value of the controlling expression, and cases where they do not (a.k.a. unstable tests). There are two stable and two unstable test cases, according to the evaluations of the two conditions. The semantics for stable test cases is standard. The semantics for unstable test cases is defined by composing the left restriction of the left version π 1 (s) skip and the right restriction of the right version skip π 2 (t) of the then and else branches. Intuitively, π 1 (s) skip means that the left version of the double program executes s, while the right version of the double program does nothing. The semantics for (possibly unbounded) iteration statements is defined using the least fixpoint of a function defined similarly.

Properties of interest
We wish to prove the functional equivalence between the left and right versions of a given double program P ∈ dstat, restricted to a set of distinguished outputs, specified with the assert_sync primitive. Let x 0 ∈ D be an initial doubleprogram state. The set of states reachable by P is D P { x 0 }. Let Ω be a set of output left-values of program P . The property of interest is that π 1 (P ) and π 2 (P ) compute equal values for all outputs: For instance, let S denote the set of reachable states of Example 1, before line 8:  Our concrete collecting semantics D is not computable in general. We will thus rely on computable abstractions, to infer this property by static analysis. Note that the use of assume_sync and assert_sync in specifications allows for both whole-program analysis, and separate analyses of program parts.

Memory abstraction
Though we aim at designing a computable abstract semantics in Sect. 4, we first tailor a (non computable) abstraction of our memory model. We rely on the Cells memory abstraction of simple programs [23], [26,Sect. 5.2]. In order to handle C programs computing with machine integers of multiple sizes, with bytelevel access to their encoding through type-punning, this domain represents the memory as a dynamic collection of scalar variables, termed cells, holding values for the scalar memory dereferences discovered during the analysis. It maintains a consistent abstract state despite the introduction of overlapping cells by typepunning. We lift this memory abstraction to double programs, and we extend it for representing equalities between cells symbolically.

Cells
We first consider the finite universe Cell V × N × scalar-type × A of cells of one program. A cell V, o, τ, α ∈ Cell is denoted as a variable V , an offset o, and information specifying the encoding of values: a scalar type τ and endianness α.
To account for both programs, we introduce projected cells as Cell Cell×{ 1, 2 }, where 1 (resp. 2) denotes a cell in the memory of P 1 (resp. P 2 ). For instance, consider the program in Example 1. We show in Fig. 4 the cells synthesized at the end of the program. Let The cells for y are defined in a similar way. Both program versions first call function read_from_network, which reads a stream of bytes from an external source, and writes it into a buffer. The same stream is read by both program versions. A stub for read_from_network is shown in Fig. 5. After completion of the call, we have x 0 where b 0 and b 1 are the first and second bytes read from the network, respectively. Then, Program 1 swaps the bytes of x into those of y: x 0 1 = y 1 1 and x 1 1 = y 0 1 . Program 2, in contrast, assigns x to y. x is thus read as a 2-byte cell, while only 1-byte cells are present. Therefore, the Cells domain synthesizes x 2 by adding the constraint x 2 = 2 8 x 0 2 + x 1 2 , following big-endian byte-order, before performing the assignment y 2 ← x 2 . To sum up, we obtain the following constraints: In addition to the cell constraints on x and y: Our goal is to prove that y 1 = y 2 given such constraints. To do so, we want to leverage numerical domains to abstract the values of cells. However, such constraints require an expressive domain that can hamper the scalability of the analysis. In addition, we note that we need to infer many equalities, most of them between the left and right versions of the same cells. This is no surprise as we expect most variables to hold equal values in the little-and big-endian memories most of the time, with only local differences. Rather than relying completely on the expressiveness of the underlying numeric domain, we first optimize our memory model for this common case, introducing the concept of shared bi-cells, which act as a symbolic representation of cells equality.

Shared bi-cells
We denote as Bicell Cell ∪ ( Cell × Cell) the set of bi-cells. A bi-cell is either a projected cell in Cell, or a pair of such cells in Cell × Cell assumed to hold equal value, called a shared bi-cell. Bi-cell sharing allows a single representation, in the memory environment, for two projected cells from different program versions at the same memory location and holding equal values. Abstract memory states of double programs are modeled as a choice of a set of bi-cells C ⊆ Bicell, and a set of scalar environments on C. Let D be the associate abstract domain. An abstract state represents a set of concrete byte-level memories in D = E × E. The values of the bytes of these memories must satisfy all the numeric constraints on bi-cells implied by the environments: In Fig. 6, we depict the bi-cells obtained after analyzing the program shown in Example 1. For variable x, since read_from_network writes the same value to x 0 1 and x 0 2 , we can synthesize the shared bi-cell x 0 1 , x 0 2 to represent the equality x 0 1 = x 0 2 . In a similar way, we synthesize the shared bi-cell x 1 1 , x 1 2 . Therefore, as opposed to the separate representation of the memories of Programs 1 and 2 in Fig. 4, the joint representation induced by bi-cell sharing allows reducing the burden on numeric domains. In the following, we describe more involved cell synthesis operations that allow us to realize y 1 , y 2 , and thus to infer that y 1 = y 2 .

Cell synthesis
A cornerstone of our memory model is bi-cell synthesis. In order to read or write a scalar value to a given location of memory, we must create a suitable bi-cell, or retrieve an existing one from the environment. To guarantee the soundness of the analysis when adding a new bi-cell, it is necessary to ensure that values assigned to it are consistent with those of existing overlapping bi-cells. Our memory domain first attempts to synthesize shared bi-cells if an equality can be inferred from the environment, by pattern-matching. In case of failure, it safely defaults to a pair of projected bi-cells, the values of which are set according to those of existing overlapping bi-cells.
We have already used shared bi-cell synthesis implicitly on Fig. 6. When reading variable y at the end of Example 1, the memory domain attempts to synthesize y 1 , y 2 , as a proof of y 1 = y 2 . To this aim, it searches, among possible patterns, for an existing cell, equal to both y 1 and y 2 . x 2 is a candidate, assuming equality x 2 = y 2 is recorded in (an abstraction of) the environment. Then the domain looks for 1-byte bi-cells for y 1 and x 2 , and finds the four blue and red cells from Fig. 6. As y 1 and x 2 have opposite endian encodings, it queries the environment for equalities y 0 The success of the synthesis relies on pattern-matching, and three equalities which may be inferred by a numerical domain implementing simple symbolic propagation.
where c p denotes the 1-byte bi-cell V, o + p, u8, α, k (and respectively for c p ), and αx denotes the endianness encoding of x.

Fig. 8. Equality test between projected bi-cells.
Shared bi-cell synthesis. More generally, function φ formalizes the patterns matched attempting to synthesize a shared bi-cell for a given dereference c ∈ Cell 0 V × N × scalar-type. An implementation is proposed in Fig. 7.     Fig. 6). To define the synthesize functions φ 1 and φ 2 ∈ Cell → P(Bicell) → expr for projected bi-cells, we first need to define a generic cell synthesize function φ ∈ Cell → P(Cell) → expr, such that φ(c)(C) returns a syntactic expression denoting (an abstraction of) the value of the cell c as a function of cells in C. φ is designed as an extension to multiple endianness encodings of the cell synthesize function originally proposed in [26, sec. 5.2].
An example implementation is proposed in Fig. 9. Firstly, if the cell already exists (c ∈ C), it is directly returned by φ. Secondly, φ converts between integers of the same size and different signedness, using function wrap to model wrap-around, and function range for the range of the type: To define φ 1 and φ 2 , we project bi-cells of the appropriate side onto cells, apply φ, and lift the resulting cell expression back to a bi-cell expression. Formally, φ k x, k (C) (λy ∈ Cell. ι k (y, C))(φ(x)(π k (C))) where ι k ∈ Cell × P(Bicell) → Bicell and π k ∈ P(Bicell) → P(Cell). ι k (y, C) returns an element of occ(y, C): a shared bi-cell, if any, a projected bi-cell otherwise. π 1 is defined as π 1 (C) { x | x, 1 ∈ C ∨ ∃y : x, 1 , y, 2 ∈ C }. The definition of π 2 is analogue.
Cell addition. Cell addition, add-cell ∈ Cell 0 → D → D , then simply adds the cell(s) and initializes their value(s).

Abstract join
The abstract join must merge environment sets defined on heterogeneous bicell sets. We therefore define a unification function unify ∈ (D ) 2 → (D ) 2 . unify( C 1 , R 1 , C 2 , R 2 ) adds, with add-cell , any missing cells to C 1 , R 1 and C 2 , R 2 : respectively C 2 \ C 1 and C 1 \ C 2 . Let C 1 , R 1 and C 2 , R 2 be the resulting abstract states. C 1 and C 2 may include both projected and shared bi-cells. A shared bi-cell that does not occur in both C 1 and C 2 cannot be soundly included in the unified state, as it conveys equality information that holds for one abstract state only. All such cells are thus removed before unification. Formally, unify( The abstract join may now be defined as:

Semantics of simple statements
Before defining the semantics for double statements in this domain, we first define the semantics E k * t e ∈ D → P(V) and S k * t e 1 ← e 2 ∈ P(D ) → P(D ) for simple memory reads and writes, in program version k ∈ { 1, 2 }.

Evaluations.
To compute E k * t e C, R , we first resolve * t e into a set L of projected bi-cells on side k, by evaluating e into a set of pointer values, and gathering projected bi-cells corresponding to valid pointers: Then, we call add-cell to ensure that all the target cells in L are in the abstract environment, which updates C, R to C 0 , R 0 . Finally: Assignments. The semantics of assignments S k * t e 1 ← e 2 C, R involves more steps. Like for evaluations, we start with resolving * t e 1 into a set L of projected bi-cells on side k. Then, we realize the cells in L using add-cell : let C 0 , R 0 be the updated environment. Some of the projected bi-cells in L may have been realized into shared bi-cells. Let S (C 0 \ C) ∩ Cell 2 be the set of such shared bi-cells. Elements of S represent equalities between bi-cells projected on side k, and on side opposite to k. Such equalities may no longer hold, after assignment on side k. Therefore, we split shared bi-cells of S into their left and right projections, in a copy-on-write strategy. The updated environment is: Finally, we update the environment for the projected bi-cells written (elements of L), with the possible values of e 2 . However, this is not sufficient: it is also necessary to update the environment for any overlapping bi-cells, including shared bi-cells that have been split into pairs of projected cells. A sound and efficient (though possibly coarse) solution is to simply remove them. Indeed, removing any bi-cell is always sound in our memory model: it amounts to losing information, as we loose constraints on the byte-representation of the memory.
Let Ω ⊆ C 0 \ L be the set of such bi-cells: elements of Ω are shared bi-cells and projected bi-cells on side k, with offsets and sizes such that they overlap some element of L. The updated environment is:

Semantics of double statements
We are now ready to define the semantics D dstat ∈ D → D of double statements in this domain. Like D, D is defined by induction on the syntax. We focus on base cases, as inductive cases are unchanged.
The semantics D s 1 s 2 for two syntactically different statements composes simple programs semantics: D s 1 s 2 S 2 s 2 • S 1 s 1 . The semantics for assume_sync, assert_sync, and F e 1 0 e 2 0 are mostly unchanged, but for symbolic simplifications taking advantage of symbolic representations of equalities in our domain, for improved efficiency and precision.
In particular, when e is a deterministic expression, then D assume_sync(e) adds shared bi-cells for all dereferences in e to the abstract environment. Consistently, D assert_sync(e) first tests whether e is deterministic, and its dereferences evaluate to shared bi-cells. In this case, D assert_sync(e) raises no alarm. Otherwise, the semantics uses environment functions ρ to test equalities of bi-cell values, like for D. A similar symbolic simplification is used for the F · filter: F e 0 e 0 C, R = ∅ when e is deterministic and all dereferences evaluate to shared bi-cells, which is the common case. For instance, when evaluating D if (x < y) then s else t , if the dereferences for variable x and y evaluate to a shared bi-cells, the two unstable tests cases are ⊥.

Assignments.
In an assignment D * t e 1 ← e 2 C, R , although both programs execute the same syntactic assignment, their semantics are different, as are their endiannesses. In addition, available bi-cells may be different. By default, double assignments are straightforward extensions of simple assignments: D * t e 1 ← e 2 = S 2 * t e 1 ← e 2 • S 1 * t e 1 ← e 2 . We introduce two precision optimizations, taking advantage of implicit equalities represented by shared bi-cells. We first transform * t e 1 and the dereferences in e 2 into sets of bi-cells L and R, respectively. R may be empty, as e 2 may be a constant expression. Then, we realize the cells in L and R, using add-cell . Let C 0 , R 0 be the updated environment. Two optimizations are possible, depending on e 1 , e 2 , L, and R.
Optimization 1: Assignment of shared bi-cells. If e 1 and e 2 are deterministic expressions, and if they evaluate to bi-cells that are all shared (L∪R ⊆ Cell 2 ), then Programs 1 and 2 write the same value to the same destination. We thus update shared destination bi-cells (in L), and remove any overlapping bi-cells. Formally: where Ω ⊆ C 0 \ L is the set of (shared or projected) bi-cells overlapping elements of L. The choice of evaluating E 1 e 2 (rather than E 2 e 2 ) is arbitrary, as they are equal. Indeed, endianness α 1 = L is not used by E 1 e 2 , as all the necessary cells are materialized before the evaluation expression e 2 .
Optimization 2: Copy assignment. If the conditions for optimization 1 are satisfied, and if, in addition, e 2 = * t e 2 , and both * t e 1 and * t e 2 evaluate to single bi-cells (|L| = |R| = 1), then we are dealing with a copy assignment. We may thus soundly copy a memory information from the source {l} = L to the destination {r} = R, so as to further improve precision. We therefore create a copy of r, and any smaller bi-cell for the same bytes, to a corresponding bi-cell for the bytes of l. Newly created destination bi-cells have the sides and endiannesses of their sources. The environment is updated accordingly, to reflect equalities between sources and destinations.

Value abstraction
Connecting to numerical domains. We now rely on numeric abstractions to abstract further D into a computable abstract semantics D , resulting in an effective static analysis. Like [26, sec. 5.2], our memory domain translates memory reads and writes into purely numerical operations on synthetic bi-cells, that are oblivious to the double semantics of double programs: each bi-cell is viewed as an independent numeric variable, and each numeric operation is carried out on a single bi-cell store, as if emanated from a single program. In particular, we notice that the transfer function for simple assignments S k * t e 1 ← e 2 described in Sect. 3.5 has the form of that of an assignment in a purely numeric language, where bi-cells play the roles of the numeric variables. This property is a key motivation for the Cell domain and the extension presented in this paper. Bi-cells may thus be fed, as variables, to a numerical abstract domain for environment abstraction. Any standard numerical domain, such as polyhedra [11], may be used. Yet, as we aim at scaling to large programs, we restrict ourselves to combinations of efficient non-relational domains, intervals and congruences [16], together with a dedicated symbolic predicate domain.
We thus assume an abstract domain D C given, with concretization γ C , for each bi-cell set C ⊆ Bicell. It abstracts P(C → Z) P(Z |C| ), i.e., sets of points in a |C|−dimensional vector space. A cell of integer type naturally corresponds to a dimension in an abstract element. We also associate a distinct dimension to each cell with pointer type; it corresponds to the offset o of a symbolic pointer V, o ∈ Ptr. In order to abstract fully pointer values, we enrich the abstract numeric environment with a map P associating to each pointer cell the set of variables it may point to. Hence, the abstract domain becomes: where P C ⊆ C is the subset of bi-cells of pointer type. We refer to [26, sec. 5.2] for a formal presentation of the concretization and the abstract operators.
Introducing a dedicated symbolic predicate domain. Recall Example 1 from Sect. 1. Various implementations are possible for the byte-swaps enforcing endian portability of software. Though Example 1 shows an implementation relying on type-punning, implementations relying on bitwise arithmetics are also commonplace. In addition, system-level software, such as [30], often rely on combinations of type-punning and bitwise arithmetics. Example 2 is a simplified instance of such programming idioms: as y has type unsigned char, y|0xff00 and (y<<8)|0xff represent the same 16-bit word in different endiannesses.  For a successful analysis of Example 2, the numerical domain must interpret bitwise arithmetic expressions precisely, and infer relations such as: the loworder (respectively high-order) byte of the little-endian (respectively big-endian) version of integer x is equal to y. Then, the interpretation of dereferences of p by the memory domain introduces similar relations between cells, thanks to the bicell synthesize function. In this example, it infers that the little-endian version of the low-address (respectively high-address) byte cell in x is equal to the low-order (respectively high-order) byte of x -and the converse for big-endian.
Predicate abstract domain. We use a domain based on pattern matching of expressions to detect arithmetic manipulations of byte values commonly implemented as bitwise arithmetics. It is not sufficient to match each expression independently, as computations are generally spread across sequences of statements. We need, in addition, to maintain some state that retains and propagates information between statements. We maintain this state in a predicate domain Pred C → Bits, which maps each bi-cell c ∈ C ⊆ Bicell to a syntactic expression e in a language Bits, as a symbolic representation of predicate c = e.
Bits ::= | Slice denotes the absence of information. Otherwise, a syntactic predicate expression may be either a bit-slice, or a bitwise sum of bit-slices. A bit-slice may be an integer constant n, a bi-cell c, or a slice expression − −− → c[i, j) k denoting the value obtained by shifting the bits of c between i and j − 1 to position k: Each term of a bitwise sum of bit-slices represents a interval of bits, e.g.
We assume that bit-intervals do not overlap: each bit from the result comes from a single cell or constant. The ordering is flat, based on syntactic predicate equality: ∨ Y (c) = An abstract element X ∈ Pred denotes the set of environments that satisfy all the predicates in X , where predicates are evaluated as expressions: We do not present the abstract operators in this paper. Like that of the related symbolic constant domain [24], they are based on symbolic propagation, and implement simple algebraic simplifications. They exhibit similar, near-linear time cost in our experiments.

Evaluation
We implemented our analysis into the Mopsa platform [28,19] designed to support modular developments of precise static analyses for multiple languages and multiple properties. Our prototype is composed of 3,000 lines of OCaml: 45% for the memory abstraction, 36% for the symbolic predicate domain, and 19% for double program management and iterators. It leverages 31,000 lines (excluding parsers) of elementary functions of Mopsa: framework and utilities (64%), generic iterators and numeric domains for analyses of all languages (11%), specific iterators and memory domains for the C language (25%). We have experimented our prototype on small idiomatic examples, open source software, and large industrial software. The analyses were run on a 3.4 GHz Intel® Xeon® CPU.

Idiomatic examples
We first check the precision and robustness of our analysis against a collection of small double C programs (between 20 and 100 LOC), inspired by various implementations of byte-swaps in Linux drivers, POSIX htonl functions, and industrial software.
A set of 9 programs illustrate network data processing. These programs are similar to Example 1 of Sect. 1. They receive an integer from the network, increment it, and send over the result. Necessary byte-swaps are implemented for little-endian versions of these programs. Each example program implements a different byte-swapping technique on a 2, 4, or 8-byte integer: type-punning with pointer casts (like in Example 1), unions, or bitwise arithmetics. Refer to Examples 4, 5, and 6 in App. A for the source codes. We also analyze Example 2 from Sect. 4 to demonstrate the efficiency of our symbolic predicate domain.
Our prototype also handles floating-point data, which was omitted in the paper for the sake of conciseness. We developed small floating-point examples representative of industrial use-cases of Sect. 5.3. They include byte-swappings of simple or double precision floating-point numbers sent to or received from the network, on architectures where integers and floats are guaranteed to have the same byte-order. Type-punning is used to reinterpret floats as integers of the same size, which are byte-swapped using bitwise arithmetics. Also, a combination of type-punning and byte-swapping is used to extract exponents from double precision floats. The source codes of these Examples 8 and 9 is available in App. A.3. All analyses run in less than 200 ms and report no false alarm.

Open source benchmarks
We then check the soundness, precision, and modularity of our analysis on three benchmarks based on open source software available on GitHub, with multiple commits for bug-fixes related to endianness portability. Refer to Examples 10, 11, and 12 in App. A.4 for relevant source codes excerpts. We analyze slices between 100 and 250 LOC, using primitives assume_sync and assert_sync for modular specifications of program parts.
Our first benchmark is an implementation of a tunneling driver [30] based on an encapsulation network protocol [17], which uses big-endian integers as tunnel identifiers. The driver was introduced in the Linux kernel, and patched several times for endianness-related issues detected by Sparse [6]. Then, a performance optimization introduced a new endianness portability bug, which Sparse failed to detect. It was fixed a year later. Our analysis soundly reports this bug, as well as previous issues detected by Sparse. It reports no false alarm on the fixed code. Our second benchmark is a core library of a Linux driver [22] for ethernet and RDMA net devices [21]. We analyze a slice related to a patch, committed to fix an endianness bug introduced 3 years earlier, and undetected by Sparse despite the use of relevant annotations. The fix turned out to be incomplete, and was updated 6 months later. Our analysis soundly reports bugs on the two first versions, and no false alarm on the third. Our third benchmark is extracted from a version of a compressed read-only filesystem [34] in an alternative Android distribution [33]. We analyze a slice related to a patch, committed to fix an endianness bug introduced 3 years earlier, and undetected by Sparse due to a lack of type annotations. Our analysis soundly reports the bug, and no false alarm on the fixed version. All the analyses run within 1 second.

Industrial case study
We analyzed two components of a prototype avionics application, developed at Airbus for a civil aircraft. This application is written in C, and primarily targets an embedded big-endian processor. Nonetheless, it must be portable to littleendian commodity hardware, as its source code is reused as part of a simulator used for functional verification of SCADE [3] models. The supplement to the applicable aeronautical standard [31] related to model-based development [1] mandates, in this case, that "an analysis should provide compelling evidence that the simulation approach provides equivalent defect detection and removal as testing of the Executable Object Code". Airbus, known to rely on formal methods for other verification objectives [12,32,13,27,4], is currently considering the use of static analysis to verify this portability property.
Endianness is the main difference between the ABIs of the embedded computer and the simulator. We thus experimented our prototype analyzer on the modules of the application integrated to the simulator, to which we refer as A and S. Modules A and S are data-intensive reactive software, processing thousands of global variables, with very flat call graphs. Module A is in charge of acquiring and emitting data through aircraft buses. It is composed of about 1 million LOC, most of which generated automatically from a description of the avionics network. It handles integers, Booleans, single and double precision floats. The code features bounded loops, memcpys, pointer arithmetics, and type-punning with unions and pointer casts. It also uses bitwise arithmetics, among which several thousand byte-swaps related to endianness portability. Module S is in charge of the main applicative functions. It is composed of about 300,000 LOC, most of which generated automatically from SCADE models. It handles mostly Booleans and double precision floats. It features bounded loops and bitwise arithmetics, but no type-punning. The target application is required to meet its specifications for long missions. Analysis entry points contain loops with several million iterations to emulate this execution context. Both analyses run in 5 abstract iterations. The analysis of A runs in 20.4 hours and uses 5.5 GB RAM. The analysis of S runs in 9.7 hours and uses 2.7 GB RAM. We worked with the development and simulation teams to analyze early prototypes, and incorporate findings into the development cycle. On current versions of both modules, both analyses report zero alarm related to endianness.

Conclusion
We presented a sound static analysis of endian portability for low-level C programs. Our method is based on abstract interpretation, and parametric in the choice of a numerical abstract domain. We first presented a novel concrete collecting semantics, relating the behaviors of two versions of a program, running on platforms with different endiannesses. Then we proposed a joint memory abstraction, able to infer equivalence relations between little-and big-endian memories. We introduced a novel symbolic predicate domain to infer relations between individual bytes of the variables in the two programs, which has nearlinear cost. We implemented a prototype static analyzer, able to scale to large real-world industrial software, with zero false alarms.
In future work, we aim at extending our analysis to further ABI-related properties, such as portability between different layouts of C types, or sizes of machine integers. We also anticipate that our bi-cell sharing approach will benefit the analysis of patches [14,15] modifying C data-types, even if the two versions run under the same ABI. Finally, we are considering an industrial deployment of our endian portability analysis, as a means to address avionics certification objectives related to simulation fidelity, as mentioned in Sect. 5

A Examples
In this section, we further develop the benchmarks introduced in Section 5. Fig. 10 shows shows analysis times for the benchmarks introduced in Sections 5.1 and 5.2. It also refers to the sections of the current appendix showing related source codes, or source code excerpts. Fig. 10(a) shows analysis times, in milliseconds, for the 9 idiomatic examples of Section 5.1, illustrating network data processing. Fig. 10 In the following of this section, we show source codes for benchmarks introduced in Section 5.1, and relevant excerpts from open source benchmarks introduced in Section 5.2. Note that these source codes exhibit slight differences between the notations used in the paper, and the C syntax supported by the analyzer. Mainly, the current version of our prototype static analyzer does not interpret conditional inclusion directives of the C preprocessor directly. We therefore replace them with standard C tests on (constant) double conditions: if (0 1), or if (0 1). In addition, the static analysis primitives we use are _mopsa_assume and _mopsa_assert, together with predicate _sync, rather than assume_sync and assert_sync. For instance, Example 1 from Section 1 is re-written as: where the C macro __IS_LITTLE_ENDIAN__ is substitued, by the C pre-processor, to double condition 1 0. Recall that Program 1 is the little-endian program, while Program 2 is the big-endian.
Benchmarks related to network communication share a small set of stub functions, shown in Example 3. Function read_from_network reads a stream of bytes from an external source, and writes it into a buffer. The same stream Type-punning with unions. Example 5 shows an example using union types to implement type-punning. NTOH(x, y); y.i++; NTOH(y, z); write_to_network(z.b, sizeof(z)); }

A.3 Endianness of floats
Byte-swapping floats. Example 8 extracts a double precision float from an array of bytes read from the network. Floats are byte-swapped using a combination of type-punning and bitwise arithmetics on 64-bits integers. The implementation relies on the assumption that the order of bytes is the same for integers and floats, which is the case for most machines. write_to_network(zone,SIZE); } Extracting fields from floats. Example 9 features two ways of extracting the exponent from a double-precision float. The first is portable, the second works only for big-endian machines.

MLX5.
The second benchmark is a core library of a Linux driver [22] for ethernet and RDMA net devices [21].
An endianness issue was detected on code committed in 2017 8 . A fix was committed in 2020 9 . A second patch of the same code was committed 6 months later 10 , as the first fix was incomplete.