Encrypt-to-self: Securely Outsourcing Storage

. We put forward a symmetric encryption primitive tailored towards a speciﬁc application: outsourced storage. The setting assumes a memory-bounded computing device that inﬂates the amount of volatile or permanent memory available to it by letting other (untrusted) devices hold encryptions of information that they return on request. For instance, web servers typically hold for each of the client connections they manage a multitude of data, ranging from user preferences to technical information like database credentials. If the amount of data per session is considerable, busy servers sooner or later run out of memory. One admissible solution to this is to let the server encrypt the session data to itself and to let the client store the ciphertext, with the agreement that the client reproduce the ciphertext in each subsequent request (e.g., via a cookie) so that the session data can be recovered when required. In this article we develop the cryptographic mechanism that should be used to achieve conﬁden-tial and authentic data storage in the encrypt-to-self setting, i.e., where encryptor and decryptor coincide and constitute the only entity holding keys. We argue that standard authenticated encryption represents only a suboptimal solution for preserving conﬁdentiality, as much as message authentication codes are suboptimal for preserving authenticity. The crucial observation is that such schemes instantaneously give up on all security promises in the moment the key is compromised. In contrast, data protected with our new primitive remains fully integrity protected and unmalleable. In the course of this paper we develop a formal model for encrypt-to-self systems, show that it solves the outsourced storage problem, propose surprisingly eﬃcient provably secure constructions, and report on our implementations.


Introduction
We explore techniques that enable a computing device to securely outsource the storage of data.We start with motivating this area of research by describing three application scenarios where outsourcing storage might prove crucial.
Web Server.We come back to the example considered in the abstract, giving more details.While it is difficult to make general statements about the setup of a web server back-end, it is fair to say that the processing of HTTP requests routinely also includes extracting a session identifier from the HTTP header and fetching basic session-related information (e.g., the user's password, the date and time of the last login, the number of failed login attempts, but also other kinds of data not related to security) from a possibly remote SQL database.To avoid the inherent bottleneck induced by the transmission and processing of the database query, such data can be cached on the web server, the limits of this depending only on the amount of available working memory (RAM).For some types of web applications and a large number of web sessions served simultaneously, these memory-imposed limits might represent a serious restriction to efficiency.This article scouts techniques that allow the web server to securely outsource the storage of session information to the (untrusted) web clients.
Hardware Security Module.An HSM is a computing device that performs cryptographic and other security-related operations on behalf of the owning user.While such devices are internally built from off-the-shelf CPUs and memory chips, a key concept of HSMs is that they are specially encapsulated to protect them against physical attacks, including various kinds of side channel analysis.One consequence of this tamper-proof shielding is that the memory capacity of an HSM can never be physically extendedunlike it would be the case for desktop computers-so that the amount of available working memory might constitute a relevant obstacle when the HSM is deployed in applications with requirements that increase over time (e.g., due to a growing user base).This article scouts techniques that allow the HSM to securely outsource the storage of any kind of valuable information to the (untrusted) embedding host system.
Smartcard.A smartcard, most prominently recognized in the form of a payment card or a mobile phone security token, is effectively a tiny computing device.While fairly potent configurations exist (with 32-bit CPUs and a couple of 100KBs of memory), as the costs associated with producing a smartcard scales roughly linearly with the amount of implemented physical memory, in order to be cost effective, massproduced cards tend to come with only a small amount of memory.This article scouts techniques that allow smartcards to securely outsource the storage of valuable information to the infrastructure they connect to, e.g., a banking or mobile phone backbone, or a smartphone.
Trusted Platform Module.A TPM is a discrete security chip that is embedded into virtually all laptops and desktop PCs produced in the past decade.A TPM supports its host system by offering trusted cryptographic services and is typically relied upon by boot loaders and operating systems.TPMs are located conceptually between HSMs and smartcards, and as much as these they benefit from a secure option to outsource storage.
Outsourced Storage based on Symmetric Cryptography.If a computing device has access to some kind of external storage facility (a memory chip wired to it, a connected hard drive, cloud storage, etc.), then, intuitively, it can virtually extend the amount of memory available to it by outsourcing storage, i.e., by serializing data objects and communicating them to the storage facility which will reproduce them on request.In this article we focus on the case where neither the external storage facility nor the connection to it is considered trustworthy.More concretely, we assume that all infrastructure outside of the computing device itself is under control of an adversary that aims at reading or changing the data that is to be externally stored. 3As a first approximation one might conclude that standard tools from the domain of symmetric encryption are sufficient to achieve security in this setting.Consider for instance the following approach based on authenticated encryption (AE, [16]): The computing device samples a fresh symmetric key; whenever it wants to store internal data on the outsourced storage, it encrypts and authenticates the data by invoking the AE encryption algorithm with its key and hands the resulting ciphertext over to the storage facility; to retrieve the data, it requests a copy of the ciphertext, and decrypts and verifies it.While this simple solution requires further tweaking to thwart replay attacks, 4as long as the AE key remains private it can be used to protect confidentiality and integrity as expected.
Our Contribution: Secure Outsourced Storage w/ Key Leakage.While we confirm that standard cryptographic methods will securely solve the storage outsourcing problem if the used key material remains private, we argue that satisfactory solutions should go a step further by providing as much security as possible even if the latter assumption (that keys remain private) is not met.Indeed, different attacks against practical systems that lead to partial or full memory leakage continue to regularly emerge (including different types of side channel analysis against embedded systems, 5 cold-boot attacks against memory chips,6 Meltdown/Spectre-like attacks against modern CPUs, 78 etc.), and it is commonly understood that the corruption model considered for cryptographic primitives should always be as strong as possible and affordable.For two-party symmetric encryption (e.g., AE) this strongest model necessarily excludes any type of user corruption9 as the keys of both parties are identical: Once any party is corrupted, any past or future ciphertext can be decrypted and ciphertexts can be forged for any message, i.e., no form of confidentiality or authenticity remains.We point out, however, that for outsourced storage a stronger corruption model is both feasible and preferable.Clearly, like in the AE case, if the adversary obtains a copy of the used key material then all confidentiality guarantees are lost (the adversary can decrypt what the device can decrypt, that is, everything), but a similar reasoning with respect to integrity protection cannot be made.To see this, consider the encrypt-then-hash (EtH) solution where the computing device encrypts the outsourced data as described above, but in addition to having the ciphertext stored externally it internally registers a hash of it (computed with, say, SHA256).When the device decides to recover externally stored data, it requests a copy of the ciphertext, recomputes its hash value, and decrypts only if the hash value is consistent with the internally registered value.Note that even if the device is corrupted and its keys became public, all successfully decrypted ciphertexts are necessarily authentic.
The example just given shows that while no solution for secure storage outsourcing can do much about protecting data confidentiality against key leakage attacks, solutions can fully protect the integrity of the stored data in any case.Naive AE-based schemes do not provide this type of security, and the contribution of our work is to fill this gap and to explore corresponding constructions.Precisely, this article provides the following: (1) We identify the new encrypt-to-self (ETS) primitive as the right cryptographic tool to solve the outsourced storage problem and formalize its syntax and security properties.(2) We formalize notions more directly related to the outsourced storage problem and provably confirm that secure solutions based on ETS are indeed immediate.(3) We design provably secure constructions of ETS from established cryptographic primitives. 10(4) We develop open-source implementations of our constructions that are optimized with respect to security and efficiency.
Related Work.While we are not aware of any former systematic treatment of the encrypt-to-self (ETS) primitive, a number of similar primitives or ad hoc constructions partially overlap with our results.We discuss these in the following, but emphasize that none of them provides general solutions to the ETS problem.
Memory Encryption in Modern CPUs.Recent desktop and server CPUs offer dedicated infrastructure for memory encryption, 11 with the main applications in cloud computing and Trusted Execution Environments (TEEs).Prominent TEE examples include Intel SGX 12 and ARM TrustZone13 in which every memory access of the processes that are executed within a TEE (aka 'enclave') is conducted through a memory encryption engine (MEE).This effectively implements outsourced data storage, but with quite different access rules and patterns than in the ETS case.While we consider the (stateless) encryption of a message to a ciphertext and then a decryption of a ciphertext back to a message, MEEs are stateful systems that consider the protected physical memory area a single ciphertext that is constantly locally modified with each write operation [8].
Password Managers.A password manager can be seen as a database that stores security credentials in an encrypted form and requires e.g., a master password to be unlocked.Also this can be seen as an ETS instance, but the cryptographic design of password managers has a different focus than general outsourced storage.More concretely, the central challenge solved by good password managers is the password-based key derivation, 14 which typically involves invoking a time-expensive derivation function like PBKDF2 [9] or a memory-hard derivation function like ARGON2 [5].Password-based key derivation is not considered in our treatment of the ETS primitive (we instead assume uniform keys).
Encryptment.A symmetric encryption option that recently emerged as a proposal to protect messages in instant messaging is Encryptment [7].Its features go beyond regular authenticated encryption in that the tags contained in ciphertexts act as (cryptographically strongly binding) commitments to the encoded messages.This committing feature was deemed helpful for the public resolution of cyber harassment cases by allowing affected parties to appeal to a judging authority by opening their ciphertexts by releasing their keys.On first sight this has nothing to do with our ETS setting (in which only one party holds a key, this key would never be deliberately shared, and a necessity of provably releasing message contents to anybody else is not considered).Interestingly, however, our constructions of ETS are very similar to those of [7].The intuitive reason for this is that the ETS setting requires that ciphertexts remain unforgeable under key leakage, which somewhat aligns with the committing property of encryptment that is required to survive disclosing keys to a judge.Ultimately, however, the applications and thus security models of ETS and encryptment differ, and our constructions are actually more efficient than those in [7]. 15echnical Approach.In addition to formalizing the security of the encrypt-to-self (ETS) primitive, in the course of this article we also propose efficient provably-secure constructions from standardized building blocks.As discussed above, the authenticity promises of ETS shall withstand adversaries that have knowledge of the key material.In this setting one cannot hope that standard secret-key authentication building blocks like MACs or universal hash functions will be of help, as generically they lose all security when the key is leaked.We instead employ, as they manifest unkeyed authentication primitives, cryptographic hash functions like SHA256.A first candidate construction, already hinted at above, would be the encrypt-then-hash (EtH) approach where the message is first encrypted (using any secret key scheme, e.g., AES-CTR) and the ciphertext is then hashed.Our constructions are more efficient than this by exploiting the structure of Merkle-Damgård (MD) hash functions and dual-use leveraging on the properties of their inner building block: the compression function (CF).Intuitively, for authentication we build on the collision resistance of the CF, and for confidentiality we build on a PRF-like property of the CF.More precisely, our message schedule for the CF is such that each invocation provides both confidentiality and integrity for the processed block.This effectively halves the computational costs in comparison to the EtH approach.
We believe that a cryptographic analysis is not complete without also implementing the construction under consideration.This is because only implementing a scheme will enforce making conscious decisions about all its details and building blocks, and these decisions may crucially affect the obtained security and efficiency.We thus realized three ready-to-use instances of the ETS primitive, based on the CFs of the top performing hash functions SHA256, SHA512, and BLAKE2.In fact, observations from implementing the schemes led to considerable feedback to the theoretical design which was updated correspondingly.One example for this is connected to memory alignment: Computations on modern CPUs experience noticeable efficiency penalties if memory accesses are not aligned to specific boundaries.Our constructions reflect this at two different levels: at the register level and at the cache level (64 bit alignment for registeroriented operations, and 256 bit alignment for bulk memory transfers 16 ).

Notation
All algorithms considered in this article may be randomized.We let N = {0, 1, . ..} and N + = {1, 2, . ..}.For the Boolean constants True and False we either write T and F, respectively, or 1 and 0, respectively, depending on the context.An alphabet Σ is any finite set of symbols or characters.We denote with Σ n the set of strings of length n and with Σ ≤n the strings of length up to (and including) n.In the practical parts of this article we assume that |Σ| = 256, i.e., that all strings are byte strings.We denote string concatenation with .If var is a string variable and exp evaluates to a string, we write var ← exp shorthand for var ← var exp.Further, if exp evaluates to a string, we write var var ← n exp to denote splitting exp such that we assign the first n characters from exp to var and assign the remainder to var .When we do not need the remainder, we write var ← n exp shorthand for var dummy ← n exp and discard dummy.In pseudocode, if S is a finite set, expression $(S) stands for picking an element of S uniformly at random.Associative arrays implement the 'dictionary' data structure: Once the instruction A[•] ← exp initialized all items of array A to the default value exp, with A[idx] ← exp and var ← A[idx] individual items indexed by expression idx can be updated or extracted.

Security Games
Security games are parameterized by an adversary, and consist of a main game body plus zero or more oracle specifications.The execution of a game starts with the main game body and terminates when a 'Stop with exp' instruction is reached, where the value of expression exp is taken as the outcome of the game.The adversary can query all oracles specified by the game, in any order and any number of times.If the outcome of a game G is Boolean, we write Pr[G(A)] for the probability that an execution of G with adversary A results in True, where the probability is over the random coins drawn by the game and the adversary.We define macros for specific combinations of game-ending instructions: We write 'Win' for 'Stop with T' and 'Lose' for 'Stop with F', and further 'Reward cond' for 'If cond: Win', 'Promise cond' for 'If ¬cond: Win', 'Require cond' for 'If ¬cond: Lose'.These macros emphasize the specific semantics of game termination conditions.For instance, a game may terminate with 'Reward cond' in cases where the adversary arranged for a situation-indicated by cond resolving to True-that should be awarded a win (e.g., the crafting of a forgery in an authenticity game).

Handling of Algorithm Failures
Regarding the algorithms of cryptographic schemes, we assume that any such algorithm can fail.Here, by failure we mean that an algorithm doesn't generate output according to its syntax specification, but instead outputs some kind of error indicator (e.g., an AE decryption algorithm that rejects an unauthentic ciphertext or a randomized signature algorithm that doesn't have sufficiently many random bits to its disposal).Instead of encoding this explicitly in syntactical constraints which would clutter the notation, we assume that if an algorithm invokes another algorithm as a subroutine, and the latter fails, then also the former immediately fails. 17We assume the same for game oracles: If an invoked scheme algorithm fails, then the oracle immediately aborts as well.Further, we assume that the adversary learns about this failure, i.e., the oracle will return the error indicator when it aborts.Note that this implies that if a scheme's algorithms leak vital information through error messages, then the scheme will not be secure in our models.(That is, our models are particularly robust.)We believe that our way to handle errors implicitly rather than explicitly contributes to obtaining definitions with clean and clear semantics.

Memory Alignment
For n a power of 2, we say an address of computer memory is n-byte aligned if it is a multiple of n bytes.We further say that a piece of data is n-byte aligned if the address of its first byte is n-byte aligned.A modern CPU accesses a single (aligned) word in memory at a time.Therefore, the CPU performs reads and writes to memory most efficiently when the data is aligned.For example, on a 64-bit machine, 8 bytes of data can be read or written with a single memory access if the first byte lies on an 8-byte boundary.However, if the data does not lie within one word in memory, the processor would need to access two memory words, which is considerably less efficient.Our scheme algorithms are designed such that when they need to move around data, they exclusively do this for aligned addresses.In practice, the preferred alignment value depends on the hardware used, so for generality in this article we refer to it abstractly as the memory alignment value mav.(A typical value would be mav = 8.)

Tweaking the Compression Functions of Hash Functions
The main NIST hash functions of the SHA2 family (FIPS 180-4, [10]) accomplish their task of hashing a message into a short string by strictly following the Merkle-Damgård framework: All inputs to their core building block -the compression function-are either directly taken from the message or from the chaining state.It has been recognized, however, that options to further contextualize or domain-separate the inputs of compression functions can be of advantage.Indeed, compression functions that are designed according to the alternative, more recent HAIFA framework [4] have a number of additional inputs, for instance an explicit salt input, that allow for weaving some extra bits of context information into the bulk hash operations.A concrete example for this is the compression function of the popular BLAKE2 hash function ( [2,18], a HAIFA design), which takes as an additional input a Boolean finalization flag that is to be set specifically when processing the very last (padded) block of a hash computation.The idea behind making the last invocation "special" is that this effectively thwarts length extension attacks: While conducting extension attacks against the SHA2 hash functions, where the compression functions do not natively support any such marking mechanism, is quite immediate, 18 similar attacks against BLAKE2 are impossible [6].We note that, generally speaking, an ad hoc way of augmenting the input of a compression function by an additional small number of bits is to XOR predefined constants into the hashing state (e.g., before or while the compression function is executed), with the choice of constants depending on the added bits.For instance, if the finalization flag is set, the BLAKE2 compression function will flip all bits of one of its inputs, but beyond that operate as normal.
While textbook SHA2 does not support contextualizing compression function invocations via additional inputs, we observe that NIST, in order to solve an emerging domain-separation problem in the definition of their FIPS 180-4 standard, employed ad hoc modifications of some SHA2 functions that can be seen as (implicitly) retrofitting a one-bit additional input into the compression function.Concretely, the SHA512/t functions [10], that intuitively represent plain SHA512 truncated to 0 < t < 512 bits, are carefully designed such that for any t 1 = t 2 the functions SHA512/t 1 and SHA512/t 2 are independent of each other. 19The separation of the individual SHA512/t versions works as follows [10, Sec.5.3.6]:First compute the SHA512 hash value of the string "SHA512/xxx" (where placeholder xxx is replaced by the decimal encoding of t), then XOR the byte value 0xa5 (binary: 0b10100101) into every byte of the resulting chain state, then continue with regular SHA512 steps from that state on, truncating the final hash value to t bits.While the XORing step is ad hoc, it arguably represents a fairly robust domain separation method for SHA2.
Our constructions of the encrypt-to-self primitive rely on compression functions that are tweaked with a single bit, that is, that support one bit as an additional input.When we implement this based on SHA2 compression functions, we employ precisely the mechanism scouted by NIST: When the additional tweak bit is set, we XOR constant 0xa5 into all state bytes and continue operation as normal.Our BLAKE2 based construction, on the other hand, uses the already existing finalization bit.

Foundations of Encrypt-to-Self
The overall goal of this article is to provide a secure solution for outsourced storage.We identified the novel encrypt-to-self (ETS) primitive, which provides one-time secure encryption with authenticity guarantees that hold beyond key compromise, as the right tool to construct outsourced storage. 20In this section we first formalize and study ETS, then formalize outsourced storage, and finally show how the former immediately implies the latter.This allows us to leave the outsourced storage topic aside in the remaining part of the paper and lets us instead fully focus on constructing and implementing ETS.

Syntax and Security of ETS
ETS consists of an encryption and a decryption algorithm, where the former translates a message to a binding tag and a ciphertext, and the latter recovers the message from the tag-ciphertext pair.For versatility the two operations further support the processing of an associated-data input [16] which has to be identical for a successful decryption.
The task of the binding tag is to prevent forgery attacks: A user that holds an authentic copy of the binding tag will never accept any ciphertext they did not generate themselves, even if all their secrets become public.Note that while standard authenticated encryption (AE) does not provide this type of authentication, the encrypt-then-hash construction suggested in Sec. 1 does.In Sec. 4 we provide a considerably more efficient construction that uses a hash function's compression function as its core building block.Here, we define the generic syntax of ETS and formalize its security requirements.

Definition 1.
Let AD be an associated data space and let M be a message space.An encrypt-to-self (ETS) scheme for AD and M consists of algorithms enc, dec, a key space K, a binding-tag space Bt, and a ciphertext space C. The encryption algorithm enc takes a key k ∈ K, associated data ad ∈ AD and a message m ∈ M, and returns a binding tag bt ∈ Bt and a ciphertext c ∈ C. The decryption algorithm dec takes a key k ∈ K, a binding tag bt ∈ Bt, associated data ad ∈ AD and a ciphertext c ∈ C, and returns a message m ∈ M. A shortcut notation for this API is Correctness and Security.We require of an ETS scheme that if a message m is processed to a tag-ciphertext pair with associated data ad, and a message m is recovered from this pair using the same associated data ad, then the messages m, m shall be identical.This is formalized via the SAFE game in Fig. 1. 21In particular, observe that if the adversary queries Dec(ad, c) (for the authentic ad and c that it receives in line 02) and the dec procedure produces output m , the game promises that m = m (lines 05,06).Recall from Sec. 2.2 that this means the game stops with output T if m = m.Intuitively, the scheme is safe if we can rely on m = m, that is, if the maximum advantage Adv safe (A) := max ad∈AD,m∈M Pr[SAFE(ad, m, A)] that can be attained by realistic adversaries A is negligible.The scheme is perfectly safe if Adv safe (A) = 0 for all A. We remark that the universal quantification over all pairs (ad, m) makes our advantage definition particularly robust.
Our security notions demand that the integrity of ciphertexts be protected (INT-CTXT), and that encryptions be indistinguishable in the presence of chosen-ciphertext attacks (IND-CCA).The notions are formalized via the INT and IND 0 , IND 1 games in Fig. 1, where the latter two depend on some equivalence relation ≡ ⊆ M × M on the message space. 22For consistency, in lines 07,15,24 we suppress the message in all games if the adversary queries Dec(ad, c).This is crucial in the IND b games, as otherwise the adversary would trivially learn which message was encrypted, but does not harm in the other games as the adversary already knows m.Recall from Sec. 2.3 that all algorithms can fail, and if they do, then the oracles immediately abort.This property is crucial in the INT game where the dec algorithm must fail for unauthentic input such that the oracle immediately aborts.Otherwise, the game will reward the adversary, that is the game stops with T (line 14).We say that a scheme provides integrity if the maximum advantage Adv int (A) := max ad∈AD,m∈M Pr[INT(ad, m, A)] that can be attained by realistic adversaries A is negligible, and that it provides indistinguishability if the same holds for the advantage

Sufficiency of ETS for Outsourced Storage
We define the syntax of an outsourced storage scheme.We model such a scheme as a set of stateful algorithms, where algorithm write is invoked to store data and algorithm read is invoked to retrieve it.We indicate the statefulness of the algorithms by appending the term st to their names, where st is the state variable.Definition 2. Let M be a message space.A storage outsourcing scheme for M consists of algorithms gen, write, read, a state space ST , and a ciphertext space C. The state generation algorithm gen takes no input and outputs an (initial) state st ∈ ST .The storage algorithm write takes a state st ∈ ST and a message m ∈ M, and outputs an (updated) state st ∈ ST and a ciphertext c ∈ C. The retrieval algorithm read takes a state st ∈ ST and a ciphertext c ∈ C, and outputs an updated state st ∈ ST and a message m ∈ M. A shortcut notation for this API is Correctness and Security.We require of a storage outsourcing scheme that if a message m is processed to a ciphertext, and subsequently a message m is recovered from this ciphertext, then the messages m, m shall be identical.This is formalized via the SAFE game in Fig. 2. Observe boolean flag is ('in-sync') tracks whether the attack is active or passive.Initially is = T, i.e., the attack is passive; however, once the adversary requests the reading of a ciphertext that is not the last created one, the game sets is ← F to flag the attack as active (line 11).For passive attacks the game promises that any m returned by the read procedure is the last one that was processed by the write procedure (line 13).Intuitively, the scheme is safe if the maximum advantage Adv safe (A) := Pr[SAFE(A)] that can be attained by realistic adversaries A is negligible.The scheme is perfectly safe if Adv safe (A) = 0 for all A.
Our security notions demand that the integrity of ciphertexts be protected (INT-CTXT), and that encryptions be indistinguishable in the presence of chosen-ciphertext attacks (IND-CCA).The notions are formalized via the INT and IND 0 , IND 1 games in Fig. 2, where the latter two depend on some equivalence relation ≡ ⊆ M × M on the message space (see also Footnote 22).Recall from Sec. 2.3 that all algorithms can fail, and if they do, the oracles immediately abort.This property is crucial in the INT game where the read algorithm must fail for unauthentic input such that the adversary is not rewarded in the subsequent line in the Read oracle.For consistency we suppress the message in the Read oracle for passive attacks in all games if the adversary queries Dec(ad, c).This is crucial in the IND b games, as otherwise the adversary would trivially learn which message was encrypted, but does not harm in the other games as the adversary already knows m for passive attacks.Furthermore, we remark the adversary is only allowed to query the Corrupt oracle if M contains at most 1 message, i.e., the ChWrite oracle was queried for m 0 = m 1 .Otherwise, the adversary would be able to run the read procedure and trivially learn m.We say that a scheme provides integrity if the maximum advantage Adv int (A) := Pr[INT(A)] that can be attained by realistic adversaries A is negligible, and that it provides indistinguishability if the same holds for the advantage Adv ind (A) : Construction from ETS. Constructing secure outsourced storage from ETS is immediate: The write procedure samples a uniformly random key and runs the enc procedure of ETS to obtain a binding tag and ciphertext.It stores the binding tag (and key) in the state and returns the ciphertext.The read procedure gets the key and binding tag from the state, runs the dec procedure of ETS and returns the message.The details of this construction are in Fig. 3.The security argument is obvious.

Construction of Encrypt-to-Self
We mentioned in Sec. 1 that a generic construction of ETS can be realized by combining standard symmetric encryption with a cryptographic hash function: one encrypts the message and computes the binding tag as the hash of the ciphertext.Here we provide a more efficient construction that builds on the compression function of a Merkle-Damgård hash function.To be more precise, our construction uses a tweakable compression function with tweak space T = {0, 1}, i.e., the domain of the compression function is extended by one bit (see Sec. 2.5).We provide a general definition below.Definition 3.For Σ an alphabet, c, d ∈ N + with c ≤ d, and a tweak space T , we define a tweakable compression function to be a function F : Σ d × T × Σ c → Σ c that takes as input a block B ∈ Σ d from the data domain, a tweak t ∈ T from the tweak space, and a string C ∈ Σ c from the chain domain, and outputs a string C ∈ Σ c in the chain domain.
We will write F t (B, C) as shorthand notation for F (B, t, C).For practical tweakable compression functions the memory alignment value mav (see Sec. 2.4) will divide both c and d.When constructing an ETS scheme from F , because the compression function only takes fixed-size input, we need to map the (ad, m) input to a series of block-tweak pairs (B, t).We will refer to this mapping as the input encoding.We take a modular approach by fixing the encoding independently of the encryption engine, and detail the former in Sec.4.1 and the latter in Sec.4.2.Together they form an efficient construction of ETS.
(B1, B2, B3, B4) = ad1 ad2 ad3 m1 ad4 m2 ad5 ad6 We first convey a rough overview of our ETS construction.In Fig. 4 we consider an example with block size d double the chaining value size c.We assume that key k is padded to size d.The first block B 1 only contains associated data and we XOR B 1 with the key k before we feed it into the compression function.From the second block, we start processing message data.We fill the first half of the block with associated data ad 3 and the second half with message data m 1 , and XOR with the key.We also XOR m 1 with the current chaining value C 1 , to generate a partial ciphertext ct 1 .The same happens in the third block and we append ct 2 to the ciphertext.If there is associated data left after processing all message data we can load the entire block with associated data, which occurs in the fourth block.Note, we no longer need to XOR the key into the block after we have processed all message data, because at this point the input to the compression function will already be independent of the message m.After processing all blocks, we XOR an offset ω ∈ {ω 0 , ω 1 } with the chaining value, where ω 0 , ω 1 are two distinct constants.The binding tag will be (a truncation of) the last chaining value C * .23Note that the task of the encoding is not only to partition ad and m into blocks B 1 , B 2 , . . .as described, but also to derive tweak values t 1 , t 2 , . . .and the choice of the final offset ω in such a way that the overall encoding is injective.

Message Block Encoding
We turn to the technical component of our ETS construction that encodes the (ad, m) input into a series of output pairs (B, t) and the final offset value ω.For authenticity we require that the encoding is injective.For efficiency we require that the encoding is online (i.e., the input is read only once, left-toright, and with small state), that the number of output pairs is as small as possible, and that the encoding preserves memory alignment (see Sec. 2.4).Syntactically, for the outputs we require that all B ∈ Σ d , all t ∈ T , and ω ∈ Ω, where quantities c, d are those of the employed compression function, T = {0, 1}, and Ω ⊆ Σ c is any two-element set.(Note that |T | = 2 allows us to use the tweaking approach from Sec. 2.5; further, in our implementations we use Ω = {ω 0 , ω 1 } where ω 0 = 0x00 c and ω 1 = 0xa5 c .)Overall, the task we are facing is the following: A detailed specification of our encoding (and decoding) function can be found in Fig. 6, but we present it here in text.Our construction does not use the decoding function, but we provide it anyway to show that the encoding function is indeed injective.Roughly, we encode as follows.We fill the first block with associated data and for any subsequent block we load the associated data in the first part of the block and the message in the second part of the block.When we have processed all the message data, we load the full block with ad again.Clearly, we need to pad ad if it runs out before we have processed all message data.We do this by appending a special termination symbol ∈ Σ to ad and then appending null bytes as needed.Similarly, we need to pad the message if the message length is not a multiple of c. Naturally, one might want to pad the message to a multiple of c.However, this is suboptimal: Consider (B1, B2, B3, B4) = ad1 ad2 ad3 ad4 ad5 ad6 ad7 ad8 (B1, B2, B3, B4) = 0x00 0x00 m1 0x00 m2 0x00 m3 (B1, B2, B3, B4) = ad1 ad2 ad3 m1 ad4 m2 ad5 ad6 (B1, B2, B3, B4) = ad1 ad2 ad3 m1 m2 0x00 m3 the scenario where there are d − c + 1 bytes remaining to be processed of associated data and 1 byte of message data.In principle, message and associated data would fit into a single block, but this would not be the case any longer if the message is padded to size c.On the other hand, for efficiency reasons we do not want to misalign all our remaining associated data.If we do not pad at all, when we process the next d bytes of associated data, we can only fit d − 1 bytes in the block and have to put 1 byte into the next block.Therefore, we pad m up to a multiple of the memory alignment value mav.To be precise, we pad message with null bytes until reaching a multiple of mav.We replace the final (null) byte with the message length |m|; this will uniquely determine where m stops and the padding begins.This restricts us to c ≤ 256 bytes such that |m| always can be encoded into a single byte.As far as we are aware, any current practical compression function satisfies this requirement.In Fig. 5, for the artificially small case with c = 1 and d = 2 we provide four examples of what the blocks would look like for different inputs.The top row shows the encoding of 8 bytes of associated data and an empty message.The second row shows the encoding of empty associated data and 3 bytes of message data.The third row shows the encoding of 6 bytes of associated data and 2 bytes of message data.The final row shows the encoding of 3 bytes of associated data and 3 bytes of message data.
We have two ambiguities remaining.(1) How to tell whether ad was padded or not?Consider the first row in Fig. 5. What distinguishes the case ad = ad 1 . . .ad 7 from ad = ad 1 . . .ad 7 ad 8 with ad 8 = ?A similar question applies to the message.(2) How to tell whether a block contains message data or not?Compare e.g., the first row with the third row.This is where the tweaks come into play.
First of all, we tweak the first block if and only if the message is empty.This fully separates the authentication-only case from the case where we have message input.
Next, if the message is non-empty, we use the tweaks to indicate when we switch from processing message data to ad-only: we tweak when we have consumed all of m, but still have ad left.Note the first block never processes message data, so the earliest block this may tweak is the second block and hence this rule does not interfere with the first rule.Furthermore, observe this rule never tweaks the final block, as by definition of being the final block, we do not have any associated data left to process.
Next, we need to distinguish between the cases whether m is padded or not.In fact, as the empty message was already taken care of, we need to do this only if m is at least one byte in size.As in this case the final block does not coincide with the first block, we can exploit that its tweak is still unused; we correspondingly tweak the final block if and only if m is padded.Obviously, this does not interfere with the previous rules.
Finally, we need to decide whether ad was padded or not.We do not want to enforce a policy of 'always pad', as this could result in an extra block and hence an extra compression function invocation.Instead, we use our offset output.We set the offset ω to ω 1 if ad was padded; otherwise we set it to ω 0 .This completes our description of the encoding function.The decoding function is a technical exercise carefully unwinding the steps taken in the encoding function, which we perform in Fig. 6.We obtain that for all m ∈ M, ad ∈ AD we have decode(encode(ad, m)) = (ad, m).It immediately follows that our encoding function is injective.For readability we have implemented the core functionality of the encoding in a coroutine called nxt, rather than a subroutine.Instead of generating the entire sequence of (B, t) pairs and returning the result, it will 'Yield' one pair and suspend its execution.The next time it is called (e.g., the next step in a for loop), it will resume execution from where it called 'Yield', instead of at the beginning of the function, with all of its state intact.The encode procedure is a simple wrapper that runs the nxt procedure and collects its output, but our authenticated encryption engine described in Sec.4.2 will call the nxt procedure directly.

Encryption Engine
We now turn our focus to the encryption engine.We assume that the associated data and message are present in encoded format, i.e., as a sequence of pairs (B, t), where B ∈ Σ d is a block and t ∈ {0, 1} is a tweak, and additionally an offset ω ∈ {ω 0 , ω 1 }.To be precise, we will use the nxt procedure that generates the next (B, t) pair on the fly.
We specify the encryption and decryption algorithms in Fig. 6 and assume they are provided with a key of length d.As illustrated in Fig. 4, the main idea is to XOR the key with all blocks that are involved with message processing.For the skeleton of the construction, we initialize the chaining value C to IV and loop through the sequence of pairs (B, t) output by the encoding function, each iteration updating the chaining value C ← F t (B, C).We now describe each iteration of the enc procedure in more detail, where numbers in brackets refer to line numbers in Fig. 6.If the block is empty [69], we are in the final iteration and do not do anything.Otherwise, we check if we are in the first iteration or if we have message data left [71].In this case we XOR the key into the block [72].This ensures we start with an unknown input block and that subsequent inputs are statistically independent of the message block.If we only have ad remaining we can use the block directly as input to the compression function.If we have message data left we will encrypt it starting from the second block [73].To encrypt, we take a chunk of the message, XOR it with the chaining value of equal size and append the result to the ciphertext [74-77].We only start encrypting from the second iteration as the first chaining value is public.Finally, we call the compression function F t to update our chaining value [78].Once we have finished the loop, the last pair (B, t) equals ( , ω) by definition.So we XOR the offset ω with the chaining value C and truncate the result to obtain the binding tag [79].We return the binding tag along with the ciphertext.
The dec procedure is similar to the enc procedure but needs to be slightly adapted.Informally, the nxt procedure now outputs a block B = (ad ct) [82] instead of B = (ad m) [68].Hence, we XOR with the chaining variable [91,97] such that the block becomes B = (ad m) and the compression function call takes equal input compared to the enc procedure.The case distinction handles the slightly different positioning of ciphertext in the blocks.Finally, there obviously is a check if the computed binding tag is equal to the stored binding tag [100].

Security Analysis
In order to prove security, we need further assumptions on our compression function than the standard assumption of preimage resistance and collision resistance.For example, we need F to be difference unpredictable.Roughly, this notion says it is hard to find a pair (x, y) such that F (x) = F (y) ⊕ z for a given difference z.Moreover, we truncate the binding tag, so actually it should be hard to find a tuple such that this equation holds for the first |bt| bits.We note collision resistance of F does not imply collision resistance of a truncated version of F [3].However, such assumptions could be justified when one considers the compression function as a random function.Hence, instead of several ad hoc assumptions, we prove our construction secure directly in the random oracle model.
As described in Sec.2.5 we tweak the SHA2 compression function by modifying the chaining value depending on the tweak.Let F be the tweakable compression function in Fig. 6, We write F for the SHA2 compression function that will take as input the block and the (modified) chaining value.Let H : Σ d × Σ c → Σ c be a random oracle.In the security analysis of the SHA2 construction, we will substitute H for F in our construction.
We remark the BLAKE2b compression function is a tweakable compression function and it can be substituted directly for a random oracle with an extended input space.That is, a random oracle H : Σ d × {0, 1} × Σ c → Σ c .Hence, in the security analysis of the BLAKE2b construction, we will substitute H for F in our construction.
We remark that we cannot treat our tweaked SHA2 compression function F in this way as it would be distinguishable from random oracle H.To see this, observe that querying F on the unmodified chaining variable with tweak t = 1 yields the same result as querying F on the modified chaining variable with t = 0.In the random oracle H these two queries are completely independent.
In the random oracle model, our ETS construction from Fig. 6 with a non-tweakable / tweakable compression function provides integrity (Thm 1 / Thm 3) and indistinguishability (Thm 2 / Thm 4), assuming sufficiently large tag and key lengths.Here, we briefly state the theorems for instantiations with a non-tweakable compression function with the adaptions for instantiations with a tweakable compression function in brackets.We provide the full theorem statements and security proofs in Appendix A.

Theorem 1 (3).
Let π be the construction given in Fig. 6, H ( H) a random oracle replacing the (tweakable) compression function, A an adversary, Adv int π (A) the advantage that A has against π in the integrity game of Fig. 1 and q the number of random oracle queries, either directly or indirectly via Dec.We have, Theorem 2 (4).Let π be the construction given in Fig. 6, H ( H) a random oracle replacing the (tweakable) compression function, A an adversary, Adv ind π (A) the advantage that A has against π in the indistinguishability games of Fig. 1 and q the number of random oracle queries, either directly or indirectly via Dec.We have,

Implementation of Encrypt-to-Self
We implemented three versions of the EtS primitive.We developed optimized C code for the padding scheme and encryption engine from Fig. 6, based on the compression functions of common hash functions.Specifically, our EtS implementations are based on the compression functions of SHA256, SHA512, and BLAKE2 [10,18].We chose these functions as all three of them are ARX designs (Add-Rotate-Xor) which makes them particularly efficient in software implementations.While SHA256 and SHA512 are more widely standardized and used than BLAKE2, only the latter is a HAIFA construction and tweakable without ad-hoc modifications.Note that due to the used internal register size of 32 bits, SHA256 is most competitive on 32-bit CPUs; in contrast, SHA512 and BLAKE2 use 64-bit registers and thus perform best on 64-bit CPUs.We implemented all components of EtS in plain C, including the compression functions, the encoding schemes, and the EtS framework.In addition we implemented a range of self-tests and provide test vectors.We note that while in particular the compression functions would be good candidates for being re-implemented in assembly for further efficiency improvements, we believe that, as all three compression functions are ARX designs, the penalty of not hand-optimizing is not too drastic.
We released the source code of our implementation as open source software.The terms of use are those granted by the Apache license24 .The code is available at https://github.com/cryptobertram/encrypt-to-self.
We conducted timing measurements for our implementations.We measured on two devices: on a roughly 9-year old CPU that identifies itself as Intel Core i3-2350M CPU @ 2.30GHz, and on a more recent CPU of the type Intel Core i5-7300U CPU @ 2.60GHz.The results are shown in Table 1.The timings were taken for various message lengths, with a 16 byte associated data input in call cases.Note that the BLAKE2 based version clearly outperforms the others for all tested message lengths.Further, SHA512 is generally faster than SHA256 (except for messages that are so short that one SHA256 compression function invocation is sufficient to fully encrypt the message).a pair (ad , c ) = (ad, c) such that dec(k, bt, ad , c ) succeeds, which only happens if bt = bt.Recall the encoding function outputs a sequence S of (B, t) pairs and an offset ω.Because the encoding function is injective we must have S = S or ω = ω.Let us first assume S = S. Let C n denote the final chaining variable.Because the sequences are equal, we will arrive at C n = C n .We must have ω = ω, but clearly C n ⊕ ω 0 is not equal to C n ⊕ ω 1 (even after truncation), that is, bt = bt.We have a contradiction and conclude S = S.For the case S = S, let us now assume the subcase ω = ω.The first |bt| bits of C n must equal the first |bt| bits of C n ⊕ ω ⊕ ω , i.e., A must find a partial preimage.Because H is a random oracle, A would succeed with probability at most q • 2 −|bt| , where q is the number of queries.In the other subcase we have ω = ω.Then the first |bt| bits of C n must equal the first |bt| bits of C n , i.e., the first |bt| bits of H(B n , Ĉ n −1 ) must equal the first |bt| bits of H(B n , Ĉn−1 ), where Ĉ n −1 , Ĉn−1 are the chaining values C n −1 , C n−1 after applying tweaks t n , t n , respectively.If the inputs are not equal, A has found a partial second preimage.Since H is a random oracle, A would succeed with probability at most q • 2 −|bt| , where q is the number of oracle queries.However, if the inputs are equal we know Ĉ If we eventually conclude C n −δ = C n−δ = IV, we know one of the sequences is longer, i.e., n − δ > 0 or n−δ > 0. Otherwise the sequences would be equal, which is excluded by the injectivity of the encoding function.In the case n − δ > 0, there has been a collision in the hash function, we have already bounded this probability above.Thus, let us assume n − δ > 0. We have H(B n −δ , Ĉn −δ−1 ) = IV.Thus A has found a preimage of IV.Because H is a random oracle, A would succeed with probability at most q • 2 −c .Theorem 2. Let π be the construction given in Fig. 6, H a random oracle replacing the compression function, A an adversary, Adv ind π (A) the advantage that A has against π in the indistinguishability games of Fig. 1 and q the number of random oracle queries (either directly or indirectly via Dec).We have, Proof.Other than the challenge pair (ad, c), we can assume the decryption oracle rejects all queries by A. Otherwise A would immediately win the integrity game and the theorem holds.Encryption is done by XORing the message with the chaining variable.As long as the chaining variable never repeats, each input to H is a fresh query that has not been seen before.Then H will provide fresh, uniformly random output, as it is a random oracle.By a standard birthday argument we can bound the probability of a collision by q 2 • 2 −c .Now let us assume there is no collision.Each chaining variable that is used to encrypt is output of a query to H that XORed the key k with the input.Additionally each block that has message data as input is also XORed with the key k.Thus if A does not know k it cannot query H to obtain the chaining variable.The key is only used with input to the compression function, and since H is a random oracle, A can only learn by guessing the input and checking the random oracle output.However, this has a success probability of at most q • 2 −|k| .
Let H : Σ d ×{0, 1}×Σ c → Σ c be a random oracle.We now consider an instantiation with a tweakable compression function F .We replace F with random oracle H. Theorem 3. Let π be the construction given in Fig. 6, H a random oracle replacing the tweakable compression function, A an adversary, Adv int π (A) the advantage that A has against π in the integrity game of Fig. 1 and q the number of random oracle queries (either directly or indirectly via Dec).We have, Proof.For all ad ∈ AD, m ∈ M we will show that

Fig. 1 .
Fig. 1.Games for ETS.For the values ad , c provided by the adversary we require that ad ∈ AD, c ∈ C. Assuming ⊥ / ∈ M, we encode suppressed messages with ⊥.For the meaning of instructions Stop with, Lose, Promise, Reward, and Require see Sec. 2.2.

Fig. 2 .Fig. 3 .
Fig. 2. Games for outsourced storage.For all values m, m 0 , m 1 , c provided by the adversary we require that m, m 0 , m 1 ∈ M and c ∈ C. Assuming ⊥ / ∈ M, we encode suppressed messages with ⊥.Boolean flag is ('in-sync') tracks whether the attack is active or passive.For the meaning of instructions Stop with, Lose, Promise, and Require see Sec. 2.2.
Pr[INT(ad, m,A)] ≤ q 2 • 2 −c + q • 2 −|bt| .Let ad ∈ AD be associated data and let m ∈ M be a message.The game INT(ad, m, A) samples a uniformly random key k ∈ K and computes (bt, c) = enc(k, ad, m).A wins the INT game if it provides

Table 1 .
Timings (in microseconds) of EtS implementation