Poly-Logarithmic Range Queries on Encrypted Data with Small Leakage

Privacy-preserving range queries allow encrypting data while still enabling queries on ciphertexts if their corresponding plaintexts fall within a requested range. This provides a data owner the possibility to outsource data collections to a cloud service provider without sacrificing privacy nor losing functionality of filtering this data. However, existing methods for range queries either leak additional information (like the ordering of the complete data set) or slow down the search process tremendously by requiring to query each ciphertext in the data collection. We present a novel scheme that only leaks the access pattern while supporting amortized poly-logarithmic search time. Our construction is based on the novel idea of enabling the cloud service provider to compare requested range queries. By doing so, the cloud service provider can use the access pattern to speed-up search time for range queries in the future. On the one hand, values that have fallen within a queried range, are stored in an interactively built index for future requests. On the other hand, values that have not been queried do not leak any information to the cloud service provider and stay perfectly secure. In order to show its practicability we have implemented our scheme and give a detailed runtime evaluation.


INTRODUCTION
Cloud computing allows a data owner to outsource her data while enabling her to access this data collection with arbitrary devices anytime. Even devices with small computation power can be used to access an enormous data collection. This is possible by delegating computational expensive operations like searching to the cloud service provider. Then only a small subset matching the search query is processed directly by the client's device.
In order to preserve data privacy, the outsourced data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
CCSW ' must be encrypted. However, standard encryption schemes are not suitable for this scenario since they prevent processing encrypted data. As a result, the complete encrypted data collection must be transferred to the client's device and decrypted and processed locally. Advanced encryption schemes allow the cloud to perform search operations like exact pattern matching or range queries on ciphertexts. In more detail, the data owner can encrypt his files augmented with additional information (e.g. keywords, timestamps). The data owner transfers the ciphertexts created by this scheme to the cloud service provider. Using the secret key the data owner can create a search token (e.g. for exact pattern matching of a keyword, for a range the timestamp should fall within) and pass it to the cloud service provider. Using this search token the cloud service provider can filter for all ciphertexts that match with the search token. All previous schemes providing this functionality, either have linear search time or leak the complete order of all outsourced values thus are vulnerable to simple yet effective attacks presented recently on property preserving encryption by Naveed et al. in [25]. In this paper we present a novel approach for implementing privacy-preserving range queries with poly-logarithmic searchtime that only leaks the access pattern, hence prevent such powerful attacks. In our scheme we enable the cloud service provider to compare range tokens that have already been queried in previous search requests. This enables the cloud service provider to decrease its amortized search time for range queries. While initial search time for a range query is linear in the number of indexed files we can speed up future queries as follows: In the first, initial search the cloud service provider learns the result set of the range query; given a range query in a second search request that is a subrange of the already queried range in the first step, it is sufficient to scan this previously learned result set. This downscaling of the possible search space results in a tremendous speed-up for the search operation. Furthermore, using this approach for every new range query the cloud service provider can construct and update an encrypted search index in an interactive protocol between the client and the server. As a result, the scheme achieves decreased search time. In addition, ciphertexts that have never fallen within any queried range are not contained in any access pattern, hence, using a suitable encryption scheme, these unqueried ciphertexts do not leak any information at all.
By implementing a prototype in Python 3 we demonstrate the performance benefits of our construction after a short period of queries. Furthermore, by changing parameters that influence how our index is organized we can decrease com-putational effort for the client, but increase it on the server side. This combination of different trade-off parameters allows suitable deployments for different use cases. We contribute a new encryption scheme for privacy preserving range queries, whose properties can be summarized as follows: secure: We define and proof security using a simulationbased approach in a widely accepted formal model. In more detail, we define leakage functions that give an upper bound for information that is leaked by our construction. efficient: Our scheme has amortized poly-logarithmic runtime. This is achieved by interactively building a search index. The implementation shows the benefits of this change already after a short period of queries. modular: We build our scheme on a black box interface for functional encryption for secure inner product evaluation. Hence, we can profit from any performance improvements in this active research area. To evaluate this approach we have implemented our scheme based on different functional encryption schemes. This paper is structured as follows. We give an overview of related work in Section 2. In Section 3 we give a definition of the problem, present two naive solutions with their drawbacks and define the security we want to achieve. Then we present the actual implementation including a proof for its security in Section 4. We go on with a practical evaluation in Section 5 and conclude in Section 6.

RELATED WORK
The problem of secure data outsourcing while still enabling computation can be addressed using fully homomorphic encryption [11]. However, due to performance shortcomings of this universal solution, a variety of algorithms and protocols for specific use-cases have been published, e.g., benchmarking [16,17,22], RFID tracking [20], reputation systems [18], e-commerce [7].
In this work we focus on search over encrypted data as first proposed by Song et al. in [29]. A scheme for a similiar scenario in the public key setting was presented in [5]. Although deterministic encryption can be used and the same functionality has been proposed in [2], searchable symmetric encryption provides better security properties. The main reason why implementations of encrypted databases like CryptDB [26] nevertheless use deterministic encryption is the low deployment overhead. Especially, indexing techniques provided by the database engine can result in huge search time speed-up. Goh published the first scheme using indexing techniques for searchable encryption in [12]. Further improvements for indexing searchable encrypted data are presented in [8,9]. Recently an idea has been published by Hahn and Kerschbaum in [14] where the index for exact pattern matching is constructed in an incremental way by using information of already searched tokens. From a high level perspective we extend their idea from privacypreserving exact pattern matching to range queries.
For the functionality of range queries a similar trade-off between security and processing time is possible by building search indexes or additional information leakage. The idea of order-preserving encryption was introduced by Agrawal et al. in [1]. In more detail, this kind of encryption has the following characteristic: given two plaintexts x and y with property x ≥ y, the same property Enc(x) ≥ Enc(y) holds for their corresponding ciphertexts. A first concrete implementation of order-preserving encryption was introduced by Scheme Sublinear Index Leakage Search Time Boneh, Waters [6] no n/a Shi et al. [28] no n/a Shen et al. [27] no n/a Lu [23] yes Order Wang et al. [30] yes Bucketization Wang et al. [31] yes Distance Demertzis et al. [10] yes -This paper yes - Table 1: Comparison of different schemes for privacypreserving range queries.
Boldyreva et al. in [4] and optimized in [21]. However, privacy properties of such encryption schemes might be questionable for highly sensitive data. Although addressed by work like [19], recent work published by Naveed et al. [25] demonstrate concrete attacks on order-preserving encrypted values with low entropy in practice. The paradigm of searchable encryption for exact pattern matching, i.e. hide as much information as possible by only unveiling tokens corresponding to requested predicates, can be transferred into encryption schemes supporting secure range queries. One solution in the secret key setting has been published in [27]. Solutions for the public key setting exist and have been published in [6,28]. In [10] this construction has been revisited and realized with searchable encryption for exact pattern matching. This leads to faster execution time but leakage from queries increases e.g. information about the covered subranges is unveiled. The first approach of building search indexes for range queries has been introduced by Lu in [23], however, this index reveals the order of all indexed elements. A trade-off between privacy and performance for range queries is proposed in [15] by using bucketization of indexed ciphertexts. Other tree index approaches have been published by Wang et al. in [30], however, again bucketization of indexed ciphertexts is leaked. In [31] an encryption is used that leaks the relative distance of all indexed ciphertexts to build an R-tree as index for ciphertexts. The leakage of all these indexes results in the vulnerability to the before mentioned attacks as those published by Naveed et al. A comparison of different approaches for secure range queries is presented in Table 1.

DEFINITIONS
Let N denote the set of natural numbers. We denote [i, j] with i ≤ j and i, j ∈ N as the set of integers starting at i and including j, i.e., the set {i, . . . , j}. The output z of a (possible probabilistic) algorithm A is written as z ← A. Throughout, λ denotes the security parameter. A function f : N → R is called negligible (in x) if for every positive polynomial p(·) there exists a x0 such that for all x > x0 it holds that f (x) < 1/p(x). Given matrix M , we denote M

Problem description
A scheme for secure and efficient range queries is composed of the following (partly probabilistic) polynomial-time algorithms: SRQ-Setup, SRQ-Enc, SRQ-IndexFiles, SRQ-Token, SRQ-Search. In the initial step, the data owner creates public parameters and a master key for a desired value domain by running SRQ-Setup. We assume the public parameters are known by all parties and omit them for the sake of simplicity in the remainder of the work. In the next step a message collection is encrypted and indexed under given value points by running SRQ-Enc; each value point has to lie in the value domain used in the initial setup step. The result consisting of a encrypted index and a ciphertext collection is transferred to a server using SRQ-IndexFiles. From this moment on the data owner holding the master key mk is able to create range tokens by calling SRQ-Token. Given this range token to the server he can run SRQ-Search to filter all (encrypted and indexed) messages associated with value points falling within the requested range. γ, C ← SRQ-IndexFiles( ID(m), c [1,n] ): is a deterministic algorithm that takes n tuples ID(mi), ci as input.
A secure search index γ and a ciphertext collection C is output.
τQ ← SRQ-Token(mk, Q): is a probabilistic algorithm that takes master key mk and range [q (s) , q (e) ] = Q ⊆ [0, D − 1] as input and outputs a search token τQ for range Q.
IDQ ← SRQ-Search(τQ, γ): is a deterministic algorithm that takes a range token τQ for range Q and index γ as input and outputs IDQ.

OPE and RPE
In the following we describe two solutions for building an encrypted search index that supports range queries as described before: i) sorting all indexed ciphertexts beforehand with order-preserving encryption allows logarithmic search time or ii) scan all indexed ciphertexts linearly using range predicate encryption.
In more detail, the first solution utilizes a scheme OPE = (OPE-Setup, OPE-Enc, OPE-Dec) where OPE-Enc(x) ≤ OPE-Enc(y) if and only if x ≤ y. All index entries of the form ci = OPE-Enc(vi), ID(mi) are sorted by the OPE encrypted values. For search queries for range [q (s) , q (e) ] a range token is implemented as a tuple τQ = OPE-Enc(q (s) ), OPE-Enc(q (e) ) . Given τQ to the server storing the search index she is able to obtain the set { OPE-Enc(vi), ID(mi) : OPE-Enc(q (s) ) ≤ OPE-Enc(vi) ≤ OPE-Enc(q (e) )} in logarithmic time by running binary search. However, even indexed but not queried points can be compared with all other indexed (queried and not queried) points. This can be exploited for concrete attacks and result in a total data breach in the worst case as demonstrated by Naveed et al. in [25].
One approach mitigate this attack vector, i.e. to hide the information about the order is Range Predicate Encryption (RPE) introduced by [28] in the public key setting. Later RPE has been transformed to the private key setting by [23] using techniques from [27]. In our work we utilize the approach of range predicate encryption, hence we describe its design and security properties in more detail in this paragraph. An RPE scheme consists of the following algorithms. • c ← RPE-Enc(k, v) on input of a key k and an attribute value v outputs a ciphertext c.
• tkQ ← RPE-Token(k, Q) on input of key k and range Q outputs range token tkQ.
• {0, 1} ← RPE-Match(tkQ, c) on input of range token tkQ and ciphertext c = RPE-Enc(k, v) outputs 1 if v ∈ Q and 0 otherwise. Security for an RPE scheme guarantees plaintext privacy (cf. Definition 2) on the one hand, and predicate privacy (cf. Definition 3) on the other hand. Definition 2. Let RPE be a range predicate encryption scheme. Consider the following security game between attacker A and a challenger consisting of the phases described below: where it wishes to be challenged.
. The challenger generates a token by running tkQ i ← RPE-Token(k, Qi) and outputs it.
2. Ciphertext query: On the i-th query, A submits a value zi. The challenger value point zi by running RPE-Enc(k, zi) and returns the output.

Query Phase 2:
A adaptively issues further queries with the same restrictions as in Phase 1.
Guess: A outputs a guess b of b. We say RPE has selective secure plaintext privacy, if for all probabilistic polynomial-time attackers A runnig this security game, it holds that where ε is negligible in λ. Definition 3. Let RPE be a scheme for range predicate encryption. Consider the following security game between attacker A and a challenger consisting of the phases described below: where it wishes to be challenged.

Setup:
The challenger generates a secret key k by running Query Phase 1: A adaptively issues queries, where each query is one of two types: 1. Token query: On the i-th query, Qi ⊂ [0, D − 1] is submitted. The challenger generates a token by running τQ i ← RPE-Token(k, Qi) and outputs τQ i .
2. Ciphertext query: On the i-th query, value point zi is submitted such that zi ∈ R0 ∧ zi ∈ R1 or zi / ∈ R0 ∧ zi / ∈ R1. The challenger encrypts value point zi by running RPE-Enc(k, zi) and returns the output. Query Phase 2: A adaptively issues further queries with the same restrictions as in Phase 1.
We say RPE has selective secure predicate privacy, if for all probabilistic polynomial-time attackers A runnig this security game, it holds that Given an RPE scheme with such security properties one can construct a scheme with small leakage but linear runtime. More particular, for message mi = (fi, vi) the attribute vi is encrypted to ci = RPE-Enc(vi) and the tuple (ci, ID(mi)) is indexed. For a range query of range Q a token tkQ is created by the data owner holding the master key using RPE-Token. Given this token, the server creates IDQ by return all entries ID(mj) with RPE-Match(tkQ, cj) = 1. Note that it is necessary to scan the complete index, hence runtime is linear in the number of all indexed files.

Security definition
In order to increase search speed, messages have to be indexed in a suitable way but we want this index to leak as little information as possible. In the next definition present a framework to formalize leakage using the simulation-based definition as introduced by Curtmola et al. in [9]. Simulator S returns an ftuple C and a q-tuple TK to the adversary. Finally, A returns a bit b that is output by the experiment.
We say SRQ is L1, L2 -secure against non-adaptive chosenrange attacks if for all probabilistic polynomial-time algorithms A there exists a probabilistic polynomial-time simulator S so that advantage of A defined as is negligible in λ.

DESIGN
Now we are ready to describe how to organize the search index in order to increase search speed but minimize the leakage of the indexed encrypted values. We tackle these contradictory requirements by updating the index every time the server learns new information. This knowledge, leaked in form of the access pattern and the corresponding search token is then used to refine the encrypted search index for future searches. First, we explain our ideas and design decision on plain data and transfer this on encrypted values in the upcoming section.

Searching on plaintexts . . .
Search index γ consists of the following two components: The point list denoted as P is a linear list of all indexed points. This list enables the server to answer all queries in linear time.
The tree list denoted as T is a list of search trees, each tree covering one coherent and already searched range. Whenever a new search is executed, existing trees are updated or a new tree is added to the list. This enables the server to answer range queries that are subranges of already queried ranges in logarithmic time. Tree list T contains R-trees [13]. Each R-tree Γ covers one coherent range completely. More particular, each inner node holds up to t entries. Each entry has the form p, R , where R is a range and p is a pointer to another node (either an inner node or a leaf) covering this range; hence pointer p points to a subtree. We denote Γ[p] as the subtree of Γ pointed to by p. For simplicity we write Γ ⊂ S for a range S, if the covered range of Γ is a subset of S and vice versa S ⊂ Γ. In addition, for any two entries p1, R1 and p2, R2 of the same node it holds that R1 ∩ R2 = ∅, i.e., the ranges in one node do not overlap. For every entry p, R it holds that the subtree rooted at the node pointed to by p covers range R, i.e., Γ[p] = R. Furthermore, all leafs consist of up to t entries, every entry has the form obj, R , where R is a range and obj points to IDR. Queried range Q = [q (s) , q (e) ] the server holding a R-tree Γ covering a superset of Q (i.e. Q ⊂ Γ) can calculate IDQ by using Algorithm 1 in logarithmic time.
A example is given in Figure 1: The initial search index γ consisting of point P and tree list T contains one tree Γ  [4,16] as depicted in Figure 1a and 1b. Lets assume, the next query is range [17,23]. In the initial step, the server checks if there exists a tree Γ ∈ T that covers the queried range [17,23] ⊂ Γ to search in logarithmic time. Since this is not the case, the server scans all entries in P linearly to construct ID [17,23] . This new information is then added to the search index for future queries and results an updated version of Γ covering [4,23] as depicted in Figure 1c.

. . . and doing so on encrypted values
Note that all functionality needed for such range queries is the following: first, checking if range R and range Q intersect and second, checking if range Q is a subrange of range R. This functionality can be provided by a slightly modified RPE scheme and hence every range query can also be answered over trees that consist of ranges encrypted by this modified RPE scheme. Every token for range Q created by RPE-Token must be augmented with encrypted limiting points (that is start and end point) encrypted using RPE-Enc additionally. This modified version of RPE combined with the idea presented in the previous Section can define an SRQ scheme with poly-logarithmic runtime formally analyzed in • c ← SRQ-Enc(mk, m): on input of master key mk = (k1, k2, k3) and message m = f, v do the following: Finally, output c = c1, c2 .
• γ, C ← SRQ-IndexFiles( ID(mi), ci i∈ [1,n] ): Initialize an empty search index γ = (P, T) that contains an empty point list P, an empty tree list T and an empty ciphertext collection C. For each i ∈ [1, n]: parse ci = (ci 1 , ci 2 ) and add tuple ID(mi), ci 2 to C. Furthermore, add tuple ID(mi), ci 1 to point list P. Output ciphertext collection C and secure search index γ.  If case 1 does not occur scan all ciphertexts (ID f i , ci 1 ) ∈ P using RPE-Match(tkQ, ci 1 ) = ri and store ID f i in the result set IDQ iff ri = 1. In order to maintain logarithmic search time for future queries that are a subrange of already queried ranges call an interactive procedure SRQ-UpdateIndex(τQ, IDQ, Γ (s) , Γ (e) , T) (described in Section 4.3). Finally, output IDQ as result. Given these algorithms it is possible to outsource encrypted data but still support range queries: The initial algorithm SRQ-Setup creates a master key and defines a possible value domain. Next the data owner encrypts his file collection by calling SRQ-Enc, each file is indexed under a value point. The encrypted files and value points are transferred to the server and added to the index via SRQ-IndexFiles. Later, the data owner holding the master key can create search tokens for ranges by calling SRQ-Token. Note that the server can compare different range tokens without knowing the master key. The server can profit from this capability to speed-up future requests by storing previously queried range tokens together with the corresponding result set in an encrypted index structure. More precisely, given two tokens τQ = c R , tkR, cR the server is able to check for the following properties: Using SRQ-Search the server getting a range token τQ for range Q = [q (s) , q (e) ] searches for all files associated with values falling within the range Q. In the initial step the server checks if he has extracted enough information from previous queries to answer the current query and if that is not to case decides how to update the search index; each tree Γi ∈ T is tested for being a subrange of Q or intersecting with Q: All entries p1, τR 1 , . . . , pm i , τR m i con- Q resulting in a tree Γ (e) ∈ T and Γ (e) = ⊥ otherwise. Depending on the result the are multiple update strategies for SRQ-UpdateIndex described in Section 4.3 in more detail: 1. One tree covers the complete queried range Q, that is Γ (s) = Γ (e) , so Q ⊂ Γ. If this is the case, the server does not need to perform a search over the complete point list P but searching over the value points indexed by Γ (s) is sufficient. This is done by Algorithm 1. Finally, SRQ-UpdateIndex has to refine indexed ranges by using information gained from the current range query. 2. No intersection of the current range query and previously queried ranges, so Γ (s) = Γ (e) = ⊥ and T = ∅. If this is the case, the server does not know anything about the current range query. As a result, the server has to scan all points indexed in point list P. Finally, SRQ-UpdateIndex has to create a new search tree that is added to tree list T covering the queried range. 3. Only a part of the queried range is covered by indexed search trees. Either Γ (s) = ⊥ or Γ (e) = ⊥. If this is the case, the server cannot know if there are values in point list P falling within Q but are not covered by Γ (s) resp. Γ (e) . As a result, the server has to scan all points indexed in point list P. Finally, SRQ-UpdateIndex has to extend the one tree covering the queried range partly (the tree that is not ⊥).

The values fall within different trees, that is c
. If this is the case, the server cannot be sure that there is no "not indexed gap" between the two trees, i.e., there could be values in P falling neither within Γ (s) nor Γ (e) but that fall within range Q. As a result, the server has to scan all points indexed in point list P. Finally, SRQ-UpdateIndex has to merge these two trees Γ (s) and Γ (e) since the gap has been closed by the current range query.

Updating the encrypted index
From a high-level perspective, a new range token contains new information given to the server, namely the result set IDQ and the set relation to all previous result sets. This newly gained information is implicit in the search token and access pattern. Note that all efficient searchable encryption schemes leak this information and we use this leakage to update the encrypted search index for accelerating future queries. For a formal security analysis of this additional knowledge given to the server we refer to Section 4.5.
As noted in previous Section 4.2 four different update situations SRQ-UpdateIndex can occur, where the server has to either refine one tree, create a new tree, extend one tree, or merge trees. In addition, trees that are covered completely by Q (i.e., contained in T) are composed using a combination of tree extension and tree merges. Since most operations make it necessary to create new range tokens for encrypted trees and this creation is only possible with the master key, these updates are interactive Algorithm 3: Refining a tree. RefineTree Input: Tree Γ, token τQ Output: Refined Tree Γ for q ∈ {q (s) , q (e) } do Search leaf that contains token τR with q ∈ R in Γ; Send τR and τQ to the client; c : Calculate Q1 = R ∩ Q and Q2 = R \ Q; c : Send back τQ 1 , τQ 2 ← SRQ-Token; Divide the list IDQ that is pointed to by obj into new lists IDQ 1 , IDQ 2 covering Q1 resp. Q2; In leaf replace (obj, τR) with two new entries (obj1, τQ 1 ), (obj2, τQ 2 ); RebalanceTree (Γ, leaf); end protocols between server and data owner. We denote steps performed at the client side as c : client_operation;. This could be necessary because the operation must be performed on plaintext or the creation of new range tokens is necessary. Furthermore, most operations add new entries to one or more existing trees, these operations require a rebalancing step (cf. Algorithm 2) to guarantee every node's size is lower than threshold t afterwards. Again, rebalancing a tree requires the creation of new range tokens, so this also requires to be an interactive protocol.
Refine a tree: The server sends the new range token and previous range tokens that intersect with this new token to the data owner asking for help. The data owner decrypts the range tokens creates (up to) four not intersecting, but more refined ranges and sends back their tokens generated by SRQ-Token. Now the server can replace the old range tokens with the new, more refined tokens and the indexed file lists are segmented according to these new tokens. For a formal description see Algorithm 3. Since this replacement increases the entries in a node, the server finally runs RebalanceTree.
Create a new tree: If Γ (s) = Γ (e) = ⊥ andT is empty the server has to create a new tree: The server creates a new, tree Γ with one entry τQ and indexed item IDQ. This tree Γ is added to tree list T.
Extend a tree: A tree should be extended if a new range token intersects partially with a tree, i.e., the range token intersects with the tree, but at least one limiting point of this newly queried range does not. This is started by the server sending the newly learned range token and the root node to the data owner. The data owner decrypts all ranges to reconstruct the whole range currently covered by this tree. A new range token for the gap between the range covered by the tree and the boundary points of the new range token lying outside the tree range is created and added to the tree's leaf. Furthermore, the tree's inner nodes (up to the root) are updated, that is, the indexed range of all inner nodes must be replaced by an extended version. See Algorithm 4 for a formal description. The resulting tree must be rebalanced after tree extension since at least one leaf got a new entry, t Merge two trees: Two trees should be merged if they both intersect with the newly queried range. Note that these two trees must not have a value gap between them. In more detail the end point covered by one tree must be directly followed by the start point covered by the other tree. This can be achieved using tree extension as described before.
Algorithm 4: Extending a tree with a new range. ExtendTree Input: Tree Γ, extension token τQ (intersecting with at least one range in the tree). Output: Updated tree Γ now also covering τQ completely. Send root node n and token τQ to client; c : Given entries pi, Ri ∈ n set [r1, r2] = R = i Ri; for i ∈ {1, 2} do c : Ask server for node-set Ni = {nj|ri ∈ nj}; for nj ∈ Ni do c : Set τ R to token with lowest resp. greatest range R = [r (s) , r (e) ]; if nj is not a leaf then c : Create new token τ Q i where Add new entry τ Q i , ID Q i to nj; Set leaf = nj; end end RebalanceTree (Γ, leaf); end In order to be able to merge trees in logarithmic time we integrate the tree Γ with the lower height into the tree Γ with greater height. So, a new entry in an inner node of Γ is created pointing to the root of Γ. This newly covered range must then be propagated through the inner nodes up to the root. See Algorithm 5 for a formal description. Again, rebalancing the resulting tree is the final step.
Merge multiple trees: If a range token has been queried where multiple trees fall within, we combine the steps of tree extension and tree merging. In more detail, all roots inT, Γ (s) , Γ (e) and the newly queried range token τQ are sent to the client. The client decrypts all roots and gets ranges Ri = [r

Runtime
For simplicity we have assumed a range is not queried multiple times so far. As a result, every token contains new information the server can use for updating index γ. Given a value domain with D elements and n indexed items, there exist  Set v = r (s) ; end c : else Set v = r (e) ; end c : Send back τ R and cv ← RPE-Enc(k, v) ; Set i = h and cur node to root of Γ; while i > h do Send entry ei = pi, τR i in cur node with v ∈ Ri and τ R to client; c : Send back token τU with U = Ri ∪ R; In entry ei replace τR i with τU ; Set cur node to node pointed to by pi; end Insert entry p, τ R in cur node, where p points to tree Γ; RebalanceTree (Γ, cur node); sible ranges have been queried and γ consists of exactly one tree containing all possible ranges.
Obviously, in this state any repeated range query can be answered in logarithmic time. However, assuming repeated queries before γ contains exactly one tree, these repeated queries may raise problems. Furthermore, these repeated queries do not contain new information, so the server is not able to update index γ. As a result, there are search patterns that result in linear search time: First, O(n) different, not coherent ranges are queried and indexed (e.g. n 2 different queries -each of size 1). Now these ranges are repeatedly queried -in average half of all indexed queries must be checked before an answer.
By implementing a cache for already queried ranges we can reduce the search time for such cases. In more detail, using a hash table keyed with deterministic range identifiers (e.g. we let Π2 = Gen, Enc, Dec be a deterministic encryption that is part of every search token) we reduce search time for repeated range queries to constant time O(1).
The runtime for one search operation is the sum of the actual search time ts and the update time tu. The height of the tree is bound by log(D) and the size of an operation on one predicate-encrypted ciphertext is also O(log(D)). Hence, merging two trees, extending one tree, refining one tree or rebalancing one tree can be done in O(log 2 (D)). Consequently, r trees can be merged in O(r · log 2 (D)). Furthermore, since any update operation adds at least one new boundary element, there can be at most n trees. As a result, the expected update time is bound by tu = O(n · log 2 (D)).
Search time depends on the newly queried range Q, i.e., if the newly queried range Q is covered by exactly one tree completely. We denote the probability of this event by Pr [Q ⊆ Γi]. If this is the case, search can be performed in O(log 2 (D)), because searching one tree is sufficient for learning the result set. Otherwise, the complete point list must be scanned and potentially updated, resulting in search time of O(n log 2 (D)). As a result, the expected search time is ts = Pr [Q ⊆ Γi] · O(log 2 (D)) + (1 − Pr [Q ⊆ Γi]) · O(n log 2 (D)).
Any time a range is not completely covered by a single tree at least one element in D is added to a search tree. Hence, the size of the set Γi increases by at least 1. Consequently, we can have at most n times a search complexity of O(n log 2 (D)). The maximum total time spent for these searches is n · n log 2 (D) This time can be amortized over the events Q ⊆ Γi. Let x be the total number of searches until amortization occurs. Then we have n · n log 2 (D) x = log 2 (D) We conclude that latest after n 2 searches we have achieved amortized poly-logarithmic search time.

Security
In this section we give a rigorous security analysis for our protocol. We can decouple encryption of the payload from the encrypted attribute value by using an arbitrary semantic secure encryption scheme. First, the security of tokenized queries using SRQ-Token is examined. Finally, we analyze the whole protocol in a simulator-based framework as introduced in [9].
Before we give a security proof according Definition 4 we define the leakage functions L1, L2 as follows where RR(Q) is a q × q range relation matrix, each element is in the set {∅, ∩, =, ⊂, ⊂ = , ⊃, ⊃ = }. Here an element in row i and column j indicates the relation of ranges Qi and Qj given in queries i and j. ∅ denotes no intersection, = denotes the equality of two ranges, ∩ denotes a intersection but no range is a subrange of the other. ⊂ denotes that range Qi is a subset of Qj but no limiting points are in common, ⊂ = denotes a subset relation with one limiting point in common, and the other way round ⊃ denotes that range Qi is a superset of Qj, i.e., if ⊂ is at positon (i, j) than ⊃ is at position (j, i). These range relations can be formulated as inequations, as shown in Table 2. Note that this information can be extracted from the access pattern, namely if IDQ intersects with IDR,then Q intersects with R as well.
We emphasize, that only encrypted values that fall within a queried range do leak information, while encrypted values that have not been queried stay semantically according to Definition 2. Furthermore, by shuffling the encrypted borders contained in the range tokens we hide the order relation of overlapping queried ranges. As a result, we do not leak the order relation of queried values but only a bucketization of these values.
In Definition 2 of selective secure plaintext privacy, the challenger does only accept challenges v0, v1 that both occur in the same subset of the access pattern. In more detail, if file fi indexed under vi is in IDQ j it must hold that f1−i indexed under v1−i is also in IDQ j for i ∈ {0, 1} and all token queries. Otherwise it would be trivial for attacker A to win the security game.
or r (s) > q (s) ∧ r (e) = q (e) r (e) < q (s) Informal, we remove these restrictions by giving the simulator access to this information in form of the access pattern and the range relation matrix. This is needed to show security of a real whole protocol run, where fulfilling the restrictions of the security games cannot be guaranteed. On the other hand, given two range token sequences with the same range relation matrix (for their ranges), no attacker can distinguish between these range token sequences. Proof. Denote εΠ as the probability of an attacker A breaking the used IND-CCA secure encryption scheme, denote ε1 as the probability of an attacker A winning the RPE plaintext privacy security game and ε2 as the probability of an attacker A winning the RPE predicate privacy game. Given negligible εΠ, ε1 and ε2 it is possible to extend, shrink and move the ranges, so that the probability of any attacker A to distinguish between a token τQ and token τ Q that is a extended, shrunk or moved version of Q is negligible.
First, given a range token τQ = (c Q , tkQ, cQ), it is possible to extend range Q to range Q as long as there is no other range R for which a token τR is known, with R∩Q = ∅ but R∩ Q = ∅. In a first step, assume no such range R exists, we later show how to move this range R. We present a series of games, and show that the probability of any attacker A to distinguish two games is negligible.
In G0 the original token τQ is given.
In G1 replace cQ with encryption c Q = Enc IND−CCA ( Q). A can distinguish between G0 and G1 with probability εΠ.
In G2 we replace tkQ with this new RPE token tk Q = RPE-Token( Q). Note that q (s) ∈ Q and q (e) ∈ Q still holds. Hence, attacker A can distinguish between G1 and G2 with probability ε2.
In G3 we move the limiting point c After G3 we have a valid token τ Q for the new range Q. Putting it altogether, attacker A can distinguish between these tokens with probability ε = εΠ + ε2 + ε1.
Shrinking a range Q to a range Q can be done in a similar way, as long as there is no other range R for which token τR is known, with R ∩ Q = ∅ but R ∩ Q = ∅. We only have to swap G3 and G2. As a result, attacker A can distinguish between a token τQ and a token for a shrunk range τ Q with probability εΠ + ε1 + ε2 = ε.
Combining these two techniques we can move a range Q = [q (s) , q (e) ] to a new range Q = [q (s) + x, q (e) + x], as long as there is no other range R with r (s) > q (s) but r (s) < (q (s) + x) (otherwise, this range R must be moved before). First, extend Q to a range Q = [q (s) , q (e) + x], then shrink Finally, we can proof Theorem 1: w.l.o.g. first assume i ] (using the techniques described before). Repeating this technique for all ranges in descending order of their end point, the complete range sequence Q is modified to an extended range sequence Q' with the same end points as R. Last, all ranges in the extend range sequence Q' are shrunk to be identical to range sequence R. As shown before, an attacker can distinguish each of these extending an shrink modifications with probability ε which is negligible. Hence, a combination of polynomial many modifications is still negligible.
Given this Theorem we are now ready to prove the security of our protocol in a formal way using leakage based Definition 4 as it has been introduced by [9] together with the defined leakage functions L1, L2 at the beginning of this section. Theorem 2. If the used RPE scheme has selective secure plaintext privacy based on an RPE scheme with selective secure predicate privacy and Π1 is an IND-CCA secure encryption scheme, then SRQ as described in Definition 4.2 is L1, L2 -secure against non-adaptive chosen-range attacks.
Proof. We present a PPT simulator S for which the advantage of any PPT adversary A to distinguish between the RealA and IdealA,S experiments from Definition 4 is negligible. For this, we describe S setting up the environment and simulating range tokens TK and ciphertexts C using leakage L1 and L2.
Simulating TK: S extracts clusters of ranges that form one big coherent range using Algorithm 6.
Each cluster is a separate R-Tree in the implementation presented in Section 4.2. For every cluster S simulates ranges with the same range relation matrix as the actual given range relation matrix RR(Q). In more detail, for every cluster simulator S transforms the range relation matrix RR(Q) into a linear program that is solved. Every relation is formulated as inequations according to Table 2. Doing this for all clusters, S gets simulated ranges Q with RR(Q) = RR( Q). Now S sets TK = (SRQ-Token(mk, Qi) i∈ [1,q] ) which is indistinguishable by Theorem 1. Note that S can restore the simulated range Qi given a range token SRQ-Token(mk, Qi) since a component consists of a ordinary IND-CCA encrypted value that can be decrypted.
Simulating C: Simulator S creates a set of leafs L. More particular, S divides ID Q in a set L consisting of disjoint sets, where L covers the same values as ID Q . Two sets IDQ i and IDQ j with IDQ i ∩ IDQ j = IDQ ij are divided in IDQ i \IDQ ij , IDQ j \IDQ ij and IDQ ij . For every simulated leaf Li ∈ L simulator S stores the indexes of all range queries that contain Li as result set: L(i) = {j|Li ⊆ IDQ j ∧IDQ j ∈ ID Q }. Given the set L of simulated leafs, S can simulate the ciphertexts C = ( c1, . . . , c f ) as follows: S iterates over all tuples (ID(fi), len(fi)) and: • if there is an simulated Lj ∈ L with ID(fi) ∈ Lj, S sets choses randomly a simulated value point vi ← k∈L(j) Q k Set ci,1 = RPE-Enc(k1, vj), ci 2 = Enc IND−CCA (k2, 0 len(f i ) ) and add tuple ci = ( ci 1 , ci 2 ) to C.
• Otherwise, there is no simulated leaf Lj ∈ L with ID(fi) ∈ Lj the encrypted file has no match with any queried ranges. Then S sets ci 1 = RPE-Enc(k1, r) with random value outside of all simulated ranges: Qi. Simulator sets ci 2 = Enc IND−CCA (k2, 0 len(f i ) ) and adds ci = ( ci 1 , ci 2 ) to C. Due to IND-CCA security of Π1, selective secure plaintext privacy of SRQ and Theorem 1 the probability for A to distinguish between C and C generated by S is negligible.
Simulating update protocols: As seen before it is possible for S to simulate range queries Q from given leakage L2(Q). Simulator S is able to simulate all update protocols on these tokens TK. Since decrypting range token τ Q i is possible for the simulator, S can run all update queries on the simulated ranges Q. Note that these update protocols do not contain new information, but all information is already covered by L1(M) and L2(Q).

EVALUATION
For evaluating our SRQ scheme we implemented a prototype in Python 3 using bindings for the PBC library (version 0.5.14) [24]. The runtime benchmarks are executed on a machine running Ubuntu 14.04 with 8GB RAM and an Intel Xeon 1230v3 CPU 3.30GHz.
We count the average number of comparisons for one range query, i.e., how often the server must run RPE-Match. All files are indexed under random value points distributed among the complete value domain. In addition, every range query is generated randomly with a size between 1 and a defined upper limit, starting from an arbitrary point in the domain. The upper limit is given as a factor of the complete domain size, for example given a domain size 1000 and a query factor 10 −1 the range for one query may have a length between 1 and 100. By varying the range size we can modify the probability for two ranges to intersect, hence we can influence the probability of merging and extending trees. Furthermore, we analyzed the number of trees our index consists of.
Next update costs on the client side are analyzed. For this we counted the number of range decryption operations and range creations.
Finally, we present micro benchmarks for encrypting one data point by running SRQ-Enc, creating a search token by SRQ-Token and checking two tokens generated for intersection depending on the domain size and security parameter. Since we use range predicate encryption as a black box, we can change the underlying implementation without modifying our construction. For our demonstration we implemented the schemes for secure inner-product evaluation from [27] and [3] and utilized them for RPE as described in [23].
Searching and updating trees: All measurements presented here are repeated five times and we use the mean values. For this section we assumed a domain D = 2 26 and a number of index files of 2 20 . Furthermore we grouped 50 values for one data point, e.g., 50 successive search queries are represented by one data point in Figure 2a. By modifying the maximum size of one range query we also modify the probability for range intersections. As a result, the number of merge operations vary, hence, also the number of trees are stored in the search index vary. This trade off is summarized in Table 3, where the number of indexed trees is given as a function of the number of already searched ranges and the query factor. These trees index a smaller average range, hence the server must scan the complete file list more often resulting in more RPE-Match calls, as depicted in Figure 2a.
Here it does not matter if RPE-Match is called for comparing two range tokens or checking if an indexed file falls within the queried range. In the worst case, there is a huge amount of trees, each covers only a small range. Now given a new small range token, all these indexed trees must be searched. If no match was found, the complete point list must me scanned additionally, resulting in even more searches than a linear scan of all files would.    t results in more RPE-Match calls per node and consequently in the overall number of RPE-Match calls as depicted in Figure 2b. On the other hand, we can decrease the probability of calling the interactive protocol RebalanceTree by increasing the number t of entries one node can hold. As a result, the server asks for help less often, hence the number of token generations can be decreased as presented in Figure 2c.
Microbenchmarks: In our SRQ implementation we used the construction from [28] utilizing functional encryption for inner products. For the secret key setting such a scheme was presented in [27] based on pairings and already used in [23]. In addition, we implemented schemes providing such functionality that have been published recently in [3]. We denote our implementation using the scheme from [27] as SRQ SSW and the scheme from [3] as SRQ BJK . Note that SRQ ABCK avoids pairings, however, this construction leaks the actual range R given a range token τR. Two parameters affect the runtime: the used security parameter benchmarked in Table 4; the possible domain size, benchmarked in Table 5.
In SRQ-Enc we omitted the actual file encryption operation using an IND-CCA secure encryption scheme. Its runtime depends on the file size and the used encryption scheme is a well studied problem. 80    Putting it all together: Finally, we present 5 runs of real searches. We implemented the RPE-scheme using the BJK with 80 Bits security parameter. Here were encrypted 2 16 files and indexed them under value points, sampled randomly out of domain D = [0, 2 12 − 1]. Figure 3 shows the mean values of all runs, where five searches are aggregated in one bar. We measured the pure search time that is performed merely on the server side. Additionally, the needed update time was measured; here the index is updated in the interactive way, hence the client and the server are involved. By adding these times we get the complete execution time for one search. Furthermore, the duration of one linear scan  of all files is depicted as a dashed line. As we can see, already after 5 search operations the execution time that includes building an index is lower than the linear search time, so we can profit from this index construction.

CONCLUSION
In this paper we proposed a novel approach for performing range queries. The server can decrease search time for future queries by updating a search index using the access patterns learned from past queries. We analyzed this effect on the runtime theoretically and empirically and have presented a simulation based security proof as it is state of the art for searchable encryption. Our leakage is tremendously smaller compared with previous schemes for privacypreserving range queries with polylogarithmic runtime. Furthermore, our construction utilized functional encryption for inner product evaluation as a block-box functionality, so one can exchange the underlying algorithm without modifying our scheme. As a result, our construction profits from all future improvements in this research area. By implementing our scheme we demonstrate its feasibility and point out different parameters to adjust search time and complexity on the client side. This adjustment enables us to deploy our scheme in varying scenarios.