Fault-Enabled Chosen-Ciphertext Attacks on Kyber

. NIST’s PQC standardization process is in the third round, and a ﬁrst ﬁnal choice between one of three remaining lattice-based key-encapsulation mechanisms is expected by the end of 2021. This makes studying the implementation-security aspect of the candidates a pressing matter. However, while the development of side-channel attacks and corresponding countermeasures has seen continuous interest, fault attacks are still a vastly underdeveloped ﬁeld. In fact, a ﬁrst practical fault attack on lattice-based KEMs was demonstrated just very recently by Pessl and Prokop. However, while their attack can bypass some standard fault countermeasures, it may be de-feated using shuﬄing, and their use of skipping faults makes it also highly implementation dependent. Thus, the vulnerability of implementations against fault attacks and the concrete need for countermeasures is still not well understood. In this work, we shine light on this problem and demonstrate new attack paths. Concretely, we show that the combination of fault injections with chosen-ciphertext attacks is a signiﬁcant threat to implementations and can bypass several countermeasures. We state an attack on Kyber which combines ciphertext manipulation–ﬂipping a single bit of an otherwise valid ciphertext–with a fault that “corrects” the ciphertext again during decapsulation. By then using the Fujisaki-Okamoto transform as an oracle, i.e., observing whether or not decapsulation fails, we derive inequalities involving secret data, from which we may recover the private key. Our attack is not defeated by many standard countermeasures such as shuﬄing in time or Boolean masking, and the fault may be introduced over a large execution-time interval at several places. In addition, we improve a known recovery technique to eﬃciently and practically recover the secret key from a smaller number of inequalities compared to the previous method.


Introduction
The emerging threat of large-scale quantum computers to asymmetric cryptography drives the research on quantum-secure schemes. The standardization process on Post-Quantum cryptography (PQC) of the National Institute of Standards and Technology (NIST) [Natb] is in the third round, and as the process slowly reaches its (preliminary) end, the topic of implementation security of PQC schemes is receiving increased attention. While for classical public-key cryptography, implementation security has seen decades of research and threats are relatively well understood, the situation is much less clear for quantum-secure algorithms.
In this regard, lattice-based key-encapsulation mechanisms (KEMs) are of particular interest. These schemes offer comparably small key-and ciphertext sizes and high speeds. Therefore, they are especially well-suited for embedded devices, which are highly susceptible to implementation attacks. Also, NIST expects to pick one of the three third-round lattice KEMs for standardization by the end of 2021, thereby making the study of side-channel and fault attacks and possible mitigations a pressing matter [Nata].
In fact, the side-channel aspect has already seen some analysis. Especially single-trace (or more generally, profiled) attacks appear to be a focal point of research, in, e.g., [PPM17,PP19, ACLZ20,PH16]. Another emerging topic is the combination of side-channel analysis with chosen-ciphertext attacks as shown in [RRCB20] and [HHP + 21]. Almost all finalist KEMs in the NIST process use some variant of the Fujisaki-Okamoto (FO) transform [FO99,HHK17] to achieve CCA security. However, chosen ciphertexts are still highly useful when combined with side-channel leakage from the re-encryption step involved in this transform [RRCB20,RBRC20,GJN20,BDH + 21]. There also exist works on sidechannel secured implementations [OSPG18,BDH + 21,HP21,RRVV15,RRdC + 16].
Fault attacks against these systems, however, are a comparably underdeveloped topic. A potential reason is that the FO transform closes many attack paths through its re-encryption step and ciphertext-equality test. One already well documented attack option is to skip the equality test and thereby re-enabling chosen-ciphertext attacks [VOGR18,OSPG18,BGRR19,XIU + 21]. Very recently, however, Pessl and Prokop [PP21] showed that other parts of the decapsulation can also be sensitive to fault injections. By skipping over certain instructions and then observing if the device still computes the correct shared secret (ineffective fault) or not (effective fault), they can gather information on the secret key. After faulting many decapsulation calls and accumulating said information, they can solve for the key. While their attack is practical, it does not come without caveats. For instance, instructions-skip attacks typically require a certain level of knowledge on the used implementation, and they presume that the attacker can find the selected instruction in the execution time. Also, since the shuffling countermeasure randomizes the time of execution, the specific attack by Pessl and Prokop might be already defeated by this relatively cheap technique. Thus, it is unknown whether such standard countermeasures might already suffice when used on an FO-transformed KEM. It is also unclear what algorithm steps and intermediates require protection, and if more general data corruption can also be used for attacks.
Our contribution. In this work, we explore these open questions and show that fault-enabled CCA-attacks are a powerful attack tool capable of bypassing many standard fault countermeasures. We also demonstrate that in these scenarios public data (the ciphertext) needs to be secured against manipulation. To show this, we present such a fault-enabled attack on Kyber, but instead of faulting the equality test within the FO-transformation, we exploit it as an oracle. We flip a single selected bit in an otherwise honestly generated ciphertext. This ciphertext is sent to the device under attack. The device decrypts and then reencrypts the message before comparing the re-encrypted ciphertext to the sent ciphertext. Anytime between the initial unpacking and the final comparison of the ciphertexts, we correct the induced flip on either the sent or the recomputed ciphertext using a fault and observe whether the following comparison fails. From this, linear inequalities involving the secret key can be derived, akin to the attack by Pessl and Prokop [PP21]. We further improve upon their key-recovery; our belief propagation based recovery technique significantly reduces the amount of memory required and recovers the secret key using far fewer faults.
Our fault can, e.g., be introduced in memory during virtually the entire decapsulation, or at more specific locations in registers during re-encryption. That is, our attack is flexible in its target choice. Securing the equality check of the FO-transform does not protect against our attack. Several countermeasures, such as shuffling (which protects against [PP21]) and also Boolean masking, can be bypassed. In addition, our fault is introduced in public data only, making fault profiling easier. We thereby stress the importance of not only securing certain operations but also to protect the integrity of both secret and public data over most of the execution time. In addition, we show that countermeasures such as first-order masking, as presented e.g. in [OSPG18], on CCA2-secured LWEbased schemes are not sufficient to protect against fault attacks. Our results in this regard are similar to and in line with the results of [BDH + 21] but in a more general setting and not reliant on a faulty comparison. In addition, compared to their approach, we provide a drastically improved key-recovery algorithm.
While Outline. We first, in Section 2, give a short overview over necessary preliminaries, focusing on the Kyber [BDK + 18] algorithm for reference in the later sections. We also give a short introduction to the belief propagation algorithm and the prior work of [PP21]. In Section 3, we describe the basic idea of our attack, leading to a description of the practical implementation and simulation in Section 4 and then give and discuss the results of our simulations. In Section 5, we explore possible countermeasures against our attack. Kyber PKE. Computations in Kyber take place in R q and R k q , with R q = Z q [x]/(x n + 1) with q = 3329, n = 256, and k ∈ { 2, 3, 4 } (depending on the parameter set). Sample Uniform performs coefficient-wise sampling from a uniform distribution over Z q , Sample Binom,η denotes coefficient-wise sampling from a centered binomial distribution defined over { −η, −η + 1, . . . , η }, where η ∈ {η 1 , η 2 }. Sampling is deterministic and depends on a seed which is incremented after each call. The functions compress and decompress are given by The concrete values of all parameters are given in Table 1. The functions Encode and Decode 1 interpret a bitstream as polynomial and vice-versa. Decode therefor multiplies each bit of a message m by q 2 . · denotes rounding to the nearest integer and mod q maps an integer x to an element x ∈ { 0, 1 . . . , q − 1 } such that x ≡ x mod q. Elements a under the number theoretic transform (NTT) are denoted byâ = NTT(a), and the symbol • denotes pointwise multiplication. Vectors and matrices, i.e. elements in R k q and R (k×k) q , are denoted in bold lowercase and uppercase letters, respectively. For an element a ∈ R k q , NTT(a) is defined as applying the NTT component-wise to its components. Key generation (Algorithm 1) uniformly samples a matrix A using a seed ρ and samples vectors s, e from a binomial distribution using a seed σ. The public key is then calculated as t = As + e, the secret key is s.
Encryption (Algorithm 2) reconstructs A, samples r, e 1 , e 2 from the binomial distribution, and computes u = A T r + e 1 and v = t T r + e 2 + Decode(m). The compressed u and v are returned as ciphertext. The decryption, depicted in Algorithm 3, computes an approximate version (due to compression being (1) and as e, r, e 2 , s, and e 1 are sufficiently small (due to being sampled from a narrow binomial distribution), the above gives m when encoded 2 .

Algorithm 3 PKE.Decrypt (simplified)
Input: Secret key sk =ŝ, ciphertext c = (c1, c2) Retrieve m 3: return m the public key and the ciphertext c under PKE.Encrypt of the message, with seed τ depending on the message and the public key. The decapsulation (Algorithm 6) decrypts the ciphertext using PKE.Decrypt and re-encrypts the message using the randomness retrieved from the message and the public-key hash H(pk). If the re-encrypted ciphertext c matches the received ciphertext c, the shared secret is returned, otherwise, the secret value z is used for an implicit rejection.

Belief Propagation
Since the belief propagation algorithm is an important part of our attack, we now give a brief introduction based on the description of MacKay [Mac03, Chapter 26]. For random variables {x i } i∈{1,...,N } = x on a set X with joint mass function Retrieve seed for re-encryption 3: c ← PKE.Encrypt(pk , m, τ ) Re-encrypt 4: if c = c then 5: Derive shared key on successful re-encryption 6: else 7: K ← KDF(z||H(c)) Implicit rejection on failure which are proportional to the marginal distributions of x n . Naïvely computing Z n is often computationally infeasible. Belief propagation exploits the factorisation of p to significantly reduce the complexity of computing the marginals.
To compute the Z n , and thereby the marginal distributions of x n , for all n, the x 1 , . . . , x N are interpreted as variable nodes, which are connected to factor nodes given by f 1 , .., f K , where the connections are described by the index sets I 1 , .., I K . In each step, either variable nodes or factor nodes send messages to connected nodes. Messages at a variable node (with index) i ∈ { 1, . . . , N } to factor node j ∈ { 1, . . . , K } are computed as where m k,i are the messages that were sent to the i-th node in the previous iteration. Messages from a factor node j to a variable node i are computed as Belief propagation consists of the repeated computation and passing of messages from variable to factor nodes and vice-versa. For an acyclic graph, the product of the messages at the n-th variable node converges against Z n . Belief propagation on cyclic graphs is called loopy belief propagation and often gives useful approximations, even graphs might not converge or, in the context of key recovery, retrieve the correct result.
Belief propagation has proven to be a powerful tool for side-channel analysis of a wide variety of schemes [VGS14,PPM17,GRO18,PP19,KPP20,GGSB20], recently also in combination with chosen-ciphertext attacks [HHP + 21].
In this work, we use belief propagation to find the most likely solution in a linear system of inequalities. This is similar to the usage of belief propagation for decoding, e.g., low density parity check (LDPC) codes [Mac03], where one aims at solving a system of (possibly erroneous) linear equations. In our case, every variable node has a so called prior distribution, which is set to the distribution the corresponding unknown variable was sampled from. In every iteration, either all variable nodes process messages and pass them to factor nodes or vice-versa. A full iteration is the combination of iterations from variable to factor nodes and vice-versa.

The Fault Attack of Pessl and Prokop
In [PP21], Pessl and Prokop introduced an instruction-skipping fault in the Decompress/Decode method of Algorithm 3. They thereby provided proof that the equality check of the FO-transform is not the only critical operation for fault attacks.
By introducing said fault and observing whether the re-encryption comparison in Algorithm 6 fails (by testing if the device still returns the correct shared secret K), they obtain a linear inequality involving the decryption error polynomial (c.f. Equation (1)). In the j-th fault introduction, this error polynomial is given by for j ∈ {0, .., l − 1} where l is the total number of faults introduced, and where ∆u j and ∆v j denote the compression error introduced to u j and v j , respectively, by applying compression and decompression. Note that r, e 1 , e 2 , ∆u, ∆v, as well as the true shared secret K are all known to the attacker, assuming he honestly performs encapsulation. Denoting the t-th component of a vector of polynomials by r (t) , by writing out Equation (2), we get that the i-th coefficient of the error term polynomial is given by where τ (h, i) = i−h mod n, and σ(h, i) returning 1 if i−h ≥ 0 and −1 otherwise. Using the notation (·) i to encapsulate all involved sign flips and index shifts, we can restate the above using dot products: Assuming that i is constant over all injections, the inequalities, involving the error polynomial, may thus be written in matrix-vector form as Note that these inequalities hold over Z as due to all polynomials involved being small, no reduction modulo q happens.
Each coefficient of each polynomial of e and s was sampled from a known binomial distribution with small support. To recover the key from the inequalities above, Pessl and Prokop initialize a 2n vector of probability distributions using said distribution. This vector is successively updated in each iteration according to the information given by the system of inequalities represented by A and b.
As previously mentioned, the attack is highly implementation specific and requires a high level of synchronisation, as one needs to skip over a very specific instruction. Also, the attack can likely be prevented by simple shuffling. Hence, the true potential of fault attacks is still unclear.

Enabling Chosen-Ciphertext Attacks with Faults
In this section, present an attack that improves on the mentioned weaknesses of [PP21]. Concretely, we explain how to use a fault to enable a chosen-ciphertext attack on Kyber. Instead of faulting the decoder, we assume an attacker to be able to introduce a single-bit fault. Sending a manipulated ciphertext and then correcting it back to a valid ciphertext after the decryption step using a fault yields inequalities, similar to [PP21]. This is due to the fact that the success of decapsulation implies that decrypting the manipulated ciphertext yields the original message used to create the valid ciphertext. This means, our attack consists of 1. manipulating the polynomial v (in Algorithm 2) of a valid ciphertext during encapsulation in Algorithm 5 by flipping a single bit in the compressed ciphertext, 2. sending the manipulated ciphertext to the device under attack, 3. correcting the ciphertext during decapsulation (Algorithm 6) using a one-bit fault, 4. observing whether a valid shared secret is established, 5. deriving inequalities from repeating the above steps (one inequality per fault), 6. and recovering the secret key from those inequalities.
As shown in Figure 5, being able to introduce one-bit faults anywhere in the red phases enables our attack. Decompression and compression methods are depicted here, even though they belong to the decryption and the re-encryption, as they prevent an earlier manipulation of c . The incoming ciphertext c is manipulated and either c or c need to be "corrected" using a fault. That is, either c is faulted such that it matches the unaltered ciphertext, or c is faulted such that it matches the manipulated ciphertext. Introducing a fault in c can be done by e.g. flipping a bit in RAM, while manipulation of c would likely be done using a fault against a value in a register, as values generated during the reencryption might not be stored RAM, or only for a shorter duration and possibly varying addresses. The decryption/re-encryption corresponds to lines 1 to 3 in Algorithm 6, the compression and decompression methods correspond to lines 5 and 1 of Algorithm 2 and Algorithm 3 which are called from lines 1 and 3 of Algorithm 6, respectively. where each coefficient is reduced to the range {0, .., q−1} and the ∆-terms denote the difference introduced by first compressing and then decompressing a (vector of) polynomial(s). The message is then recovered by mapping the coefficients of rec, rec[i] for i ∈ { 0, . . . , n − 1 }, to a 0-bit if rec[i] is closer to 0 or q than to q/2 and to a 1-bit, otherwise; i.e. the function mapping a coefficient, reduced to {0, .., q − 1}, to a bit is given by a → 0, if min(|a − q|, |a|) < q/4 1, else.
For an honestly generated ciphertext, this yields the message m with high probability as the error polynomial d, given by We are using this property of φ to manipulate ciphertexts and introduce faults such that we may derive a system of inequalities involving the secret key.
Correct vs. incorrect message. Recall that during decapsulation (Algorithm 6), the randomness τ used for re-encryption is derived by hashing the message m with the hash of the public key. Hence, flipping just a single bit in m yields a completely random c . However, if the correct m is still computed, then c will be equal to the original (non manipulated) ciphertext and thus only differs in a single bit from the sent ciphertext.
Introducing and Correcting an Error. The above observations lead to the following method. In the encapsulation step, we first create a valid ciphertext c = (v, u) where We then replace v by v = v + q 4 x i and compress it to obtain a manipulated ciphertext c , which is sent to the device under attack. The attacked device decompresses c , retrieves a message m , re-encrypts the message to a ciphertext c and compares c against c . Before the comparison, we introduce a fault, flipping a bit of c , such that c = c. Thereby, we achieve that the attacker performs the decryption on the manipulated ciphertext c but effectively compares against the honestly generated ciphertext c. Observing whether a shared secret is established then tells us if the decryption of the manipulated ciphertext c resulted in the original message m which was used to generate c. We note that this approach (manipulating a single coefficient of v and then testing if a decryption failure occurs) bares some resemblance to the side-channel attack of [BDH + 21] on (flawed) algorithms for masked comparison [OSPG18,BPO + 20].
Retrieving information from observing encryption failures. In the decryption step of the decapsulation routine, replacing c by the manipulated ciphertext c results in the message being recovered from Restricting to single-bit differences. If c and c differ by more than one bit, we do not use that ciphertext and re-try with new randomness. Otherwise, singlebit faults would not be sufficient. This may take a few tries, but as ciphertexts can be pre-computed and are not sent to the device, we do not regard this as a limitation. Note that allowing multi-bit differences would also result in slightly different inequalities as in these cases, compression changes the error added to v.
Fault location and profiling. A single-bit fault model at a specific time in execution is realistic but requires profiling or good understanding of the implementation of software as well as hardware [RSDT13,OGM17]. However, in our case, the fault may be introduced over a time interval spanning almost the whole execution time and on different intermediates and methods, e.g. against a value in memory or in a register, as depicted in Figure 1. Note that the attacker knows the value of the bit to be flipped and may discard ciphertexts with an undesired value of the targeted bit. Thus, a bit flip can be achieved if the attacker is being able to either set or reset bits, which is a more realistic attacker assumption compared to straight-up flipping [RSDT13,OGM17]. If a Boolean-masked value is targeted, then setting/resetting a bit is ineffective with a probability of 50 %, even if the plain value is known. Still, if we observe a successful decapsulation (no decryption failure), then we can infer that the bit was actually flipped. As we cannot further distinguish the cause of a decapsulation failure (decryption failure or ineffective fault injection), we have to ignore injections yielding this result. Therefore, in this case, the number of required faults is approximately quadrupled.
In addition, the observed outcome of the FO can be used for profiling, i.e., finding the correct faulting position in memory. The sent manipulated ciphertext will always be rejected unless properly corrected using a fault. This means that one can sweep over faulting positions and accept the one which leads to a correct decapsulation. 4 Even after finding a proper position, discarding inequalities resulting from observed decapsulation failures allows to filter out fault injections which did not produce the desired effect.
We finally note that the fault target (the ciphertext c) is public. Using, e.g., power analysis, one can find the point in time at which c is written into memory. This can aid in finding its address and physical location in RAM.

Obtaining Inequalities
To obtain inequalities in e and s, we apply the procedure described in the previous section l times, where in the j-th step we obtain an inequality involving the i-th coefficient of the error polynomial, denoted d j . As described before, in case of the decapsulation failing, we have d j [i] ≥ 0 and otherwise d j [i] ≤ 0 5 . The polynomial d j is given by where ψ denotes Decompress(Compress(·)). This means our inequalities are of the form where i is the coefficient we are manipulating/faulting, j is the index of the current step/fault, and all variables except for e and s are known. As the inequalities are clearly linear in e and s, we may write those equations as Ax −b where each row of the matrix A, together with the corresponding row of b and the information whether this row corresponds to a smaller or a greater sign, gives a check node as described in the next section. The construction of A and b is analogue to the construction in Section 2.3, x is the vector consisting of the entries of e and s. In contrast to [PP21], we are not directly using the matrix structure. As described in the next section, each inequality, i.e. each row of A and b, are represented by a check node in a belief propagation graph.

Recovering the Secret Key
To recover the secret key from inequalities, we use belief propagation. Our belief propagation graph consists of variable nodes, representing each unknown coefficient of e and s, and factor nodes, which inspired by decoding algorithms we call check nodes, representing an inequality. Check nodes are connected to all variable nodes. In each iteration, the check nodes send messages to the variable nodes or vice-versa. A message represents a probability distribution over { −η 1 , . . . , η 1 } of a key coefficient, where we interpret the key as consisting of e and s and denote the key vector as x. In each step the messages are combined according to the inequalities represented by the respective check node, from which probability distributions for each key coefficient are derived. Figure 2 shows a simplified example with four unknown variables (represented by x 0 , x 1 , x 2 , x 3 ) and five inequalities (represented by Check 0, . . . , Check 4 ).
The variable nodes are initialized with the distribution e and s were sampled from in Algorithm 1, this is called the prior distribution and in the following denoted by prior . In the first step, the prior distributions, now called messages, are sent to the check nodes. The check nodes update the distributions according to the inequality they represent. For an unknown coefficient i, all other messages are combined according to the represented inequality and the resulting probability distribution is used to derive a distribution for the i-th coefficient. To be precise, for each input with index i every check node computes the distribution of the sum in the inequality, leaving out i. This means a check node with index j, corresponding to an inequality i a ji x i b j receiving messages m 0 , . . . , m 2n−1 (m i belongs the i-th key coefficient) computes distributions where addition corresponds to computing the distribution of the sum of the corresponding random variables, i.e. convolution of the m i (see Section 4), and sends the messages to the i-th variable node. As described in Section 4, for efficiency reasons, the computations of Equation (3) are carried out in the Fourier domain. The variable nodes combine incoming messages by computing the product with one index left out. For incoming messages m 0 , . . . , m l (m j is the message coming from the j-th check node/inequality), the (normalized) product , x ∈ { −η 1 , . . . , η 1 } is sent to the j-th check node. Sending messages from variable nodes to factor nodes or vice-versa is called an iteration; sending messages from variable nodes to factor nodes and viceversa is called a full iteration. After each full iteration, we may compute the current resulting probability distributions for the i-th coefficient, by multiplying and normalizing the incoming messages at the i-th node. That is, we compute the distribution Note that this process is also related to decoding Low-density parity-check codes using belief propagation [Mac03]. In our setting, the implied code is not low-density, instead, all variable nodes are connected to all factor nodes.
The belief propagation varies with the parameters k and η 1 , depending on the parameter set as described in Table 1, and l. The parameter k determines the number of variable nodes, given by 2nk; the size of a message is 2η 1 + 1 probabilities. Each of the l observations corresponds to a check node.

Attack Implementation and Simulation
We implemented the simulation of the attack in Python and Rust where we rely on a modified version of PQClean [PQC]. We obtain inequalities by calling the Kyber implementation from Python, process them, and use a Rust implementation of belief propagation to recover the secret key from those inequalities. As single-bit faults are an established fault model and can be achieved e.g. using a laser, we solely rely on a simulation. The implementation is available at https://github.com/juliusjh/fault_enabled_cca.

Introducing Faults
To simulate our attack, we first fix a secret key on the assumed device. The simulated attacker then generates ciphertexts, checking for each ciphertext if the manipulation described in Section 3.1 only affects a single bit in the compressed ciphertext. If this is the case, we call a manipulated decapsulation function, correcting the manipulated ciphertext before the comparison of the ciphertext with the re-encrypted ciphertext. This function corresponds to a call to the device under attack during which we flip one bit of the ciphertext. By observing whether the correct shared secret is returned, we retrieve an inequality as described in Section 3.2. These steps are repeated until a sufficient number of inequalities have been extracted. Note that ciphertexts that differ in more than one bit are not sent to the device but are discarded before any communication happens. For our simulation, we assume a perfect bit-flipping fault. As mentioned in Section 3.1, this assumption can be relaxed at the expense of having to send more manipulated ciphertexts to the device under attack. Depending on the fault model, it might be favorable to allow the manipulated ciphertext to differ in multiple bits, e.g. in a pattern matching some property of the expected introduced fault.

Belief Propagation
For each coefficient of e and s, we initialize a variable node and for each inequality, we initialize a check node. We then propagate for a maximum of 80 full iterations, i.e. from variable to factor nodes and vice-versa, and, after each full iteration, we retrieve the resulting probability distribution at each variable node and sort by entropy, min-entropy 6 , entropy change since the last iteration, entropy and min-entropy.
If the first n coefficients are correct in any ordering, we can find the other n coefficients by solving the public key equation b = As+e using linear algebra. In this case, the belief propagation is aborted and counted as a success. To minimize the runtime for obtaining statistics, we also abort if after 5 full iterations, the number of correct coefficients has not improved.
To compute the distributions D i , described in Section 3.3, Equation (3), for each variable node i, the convolution of n − 1 messages has to be computed. Naïvely, this results in having to compute (n − 1)n convolutions of distributions (with growing support) at each check node in every full iteration. To avoid inefficient re-computations when computing partial products in check nodes, we are using the binary tree algorithm described by Pessl and Prokop in [PP21]. Using an upwardly constructed and a downwardly constructed tree, we are avoiding recomputations and may compute all D i at once. This is similar to the classical two-directional pass to compute partial products.
To construct the upward tree, we first initialize leaf nodes withŷ i . Every following layer then consists of the product of two nodes of the previous layer up until the log(2n) − 1-th layer having two nodes. The downward tree is then initialized by swapping the values of the last layer of the upward tree. Every other layer of the downward tree is computed by multiplying the node of the layer above with the sibling from the upward tree. Thus, each node of the downward tree is the product of all nodes except its child nodes. The algorithm is also used for variable nodes. As the number of inequalities is not always a power of two, we add an additional node to a layer if the number nodes in the previous layer is not even. This additional node is equal to the last node of the previous layer, i.e. we implicitly add a node with the value of the multiplicative neutral element to each uneven layer.

Results
We ran our simulations for all three Kyber parameter sets to determine the number of inequalities and thus faulted decapsulations necessary to retrieve the secret key. We first determined the range of inequalities from which on a key recovery is possible. We than tested different numbers of inequalities with a step-size of 250, 500, and 1000 inequalities depending on the rate of change in the recovered coefficients in that range. For every number of inequalities, we ran 20 experiments, the success rate is then calculated as the number of successful runs divided by the total number of runs. The number of recovered coefficients is the average number of recovered coefficients.
Abort criteria We abort if the key is found or after at most 80 full iterations. To check for a successful recovery, an attacker would use the ordering described in Section 4.2 and the public key equation. To obtain statistics, we simply abort if at least n of the coefficients in any of the orderings are correct. In addition, we abort early if there is no improvement after 20 steps.
Success rate. Figure 3 shows   Recovered coefficients. In failing cases, we often recover a high number of coefficients that might be sufficient to retrieve the key, using standard lattice reduction algorithms such as [CN11]. We count recovered coefficients as the longest chain of correct coefficients in one of the orderings described in Section 4.2 after any step. As the number of correct coefficients might decrease after further iterations, especially in corner cases, an attacker should test if the key can be recovered using the public key equations after every step. If additional lattice reduction techniques are used, those should also be applied to intermediate results. Figure 4 shows the average number of recovered coefficients. Given that the number of recovered inequalities rises quickly after obtaining more than a certain threshold of inequalities, we assume that additional lattice reduction techniques are only useful for cases where obtaining further inequalities is very difficult or expensive.
In all other cases where lattice reduction techniques may yield a correct result, retrieving more inequalities should quickly increase the success rate to 1. Runtime. Our implementation is in large parts multithreaded with the ability to utilize as many cores as the number of inequalities used. The runtime strongly depends on the number of inequalities as each inequality is represented by a factor node. Our attack runs on widely available hardware. For example, on a standard laptop, one full iteration for Kyber768 and with 7000 inequalities takes about 15 minutes using a single thread of an Intel(R) Core(TM) i7-10510U CPU. Table 2 shows the average number of full iterations and the runtime in minutes for successful runs with 32 and 8 threads on a Intel(R) Xeon(R) Gold 6242 CPU with 16 cores. Their technique to recover the secret key from those inequalities is different, but in a way related to ours. Note that they analyzed an older version of Kyber. The main difference between the versions, in the context of key recovery from inequalities, is the use of η = 3 (instead of η = 2 as in the previous version) for Kyber512. This probably makes recovering the secret key slightly harder in our case.
Pessl and Prokop report a success rate of 1 starting with 7500 inequalities for Kyber512, 10500 inequalities for Kyber768, and 11000 inequalities for Ky-ber1024. While lacking an exact comparison, our technique seems to be a clear improvement. We assume that our belief propagation based approach better avoids feedback loops which might negatively impact the result. Our implementation also uses significantly less memory than the implementation of [PP21]. While the original recovery technique requires up to 79 GB of RAM, we stay well below 10 GB of RAM usage at all times, depending on the number inequalities and threads, where we used up to 10000 inequalities and 40 threads. Regarding runtime, our implementation is slightly slower, but the attack may be carried out using a normal laptop.

Conclusion and Countermeasures
In the previous sections, we presented a realistic attack by combining a chosenciphertext attack with a fault injection and introduced an improved recovery technique for linear inequalities involving the secret key. The fault may be injected over a long execution-time interval and on different variables which hold public data. Our attack depends on the result of computations and not the computational steps itself. Therefore, securing computations and Boolean masking of inputs does not prevent our attack. We thereby highlighted the importance to protect seemingly non-sensitive, public data as well as operations over the whole execution time and to implement additional countermeasures to protect against fault-enabled chosen-ciphertext attacks. In addition, we give another exemplary usage of the belief propagation algorithm for key recovery and provide further evidence for the importance of the belief propagation algorithm for sidechannel analysis. While we targeted Kyber, we conjecture that variants of this attacks are applicable to conceptually similar schemes such as Saber [DKRV18], FrodoKEM [ABD + 21], or NewHope [AAB + 19]. In this section, we provide an overview over standard countermeasures and if and how they defend against our attack.
Shuffling. In contrast to [PP21], our attack may not be mitigated by shuffling the decoder. They introduce a fault in the decoder and need to know which bit has been faulted to extract correct inequalities, which explains the usefulness of shuffling. Shuffling in time, however, does not affect the physical location of c in RAM, which is why manipulation of this value is not prevented. Still, a second attack path, namely manipulating c during re-encryption (cf. Figure 1), becomes more difficult to exploit. When only using successful decapsulationsthey can only occur when the correct coefficient was faulted-the number of required faults is approximately multiplied by the number of shuffling positions.
Redundancy. Introducing redundancy in the storage of c, as depicted in Figure 5, might drastically increase the effort and abilities required by an attacker. For that, a hash of c is computed directly after receiving the ciphertext. Right after the comparison check of the re-encryption, c is again hashed and compared against the previously computed hash. This mainly protects against faults introduced in c while it is stored in RAM. A manipulation during compression of c is still possible, therefore the attack is not fully prevented but significantly harder to carry out.
The combination of introducing redundancy, shuffling the decoder, together with a secured comparison (via, e.g., double computation) likely prevents our attack with high probability. Additional security can be achieved by randomising the memory layout, e.g. by storing coefficients in a permuted order, and shuffling the compression function.