Thursday, November 19, 2015

Software Protection and Simulation on Hierarchial Oblivious RAM

In Divya and Nithin's blog we saw oblivious RAM which will help the user to hide access patterns made during a program execution.  The adversary won’t be able to guess the underlying algorithm running for the program, just by seeing the way the memory is accessed during the execution. This is performed by using oblivious RAM. It is oblivious in the sense that for any two inputs to a particular, the access patterns are indistinguishable for same running time. We saw the square root algorithm, where the time complexity for one access was (2√m + log (m+√m)) = O(√m) (where m is the size of RAM). A hierarchical access technique will decrease this complexity, which is discussed in this blog.

SIMPLE VERSION:

Before going into the actual hierarchical solution, we discuss a simpler solution where we consider every memory location in the RAM is a contiguous block and that each location is going to be read only once and we need to hide the order in which they are accessed.

We have n elements as ((v1, x1) (v2, x2) …, (vn, xn)). ith element can be defined  as (vi, xi), where vi is the memory location and xi is the data contained in that memory location.
We create a hash-table with n “buckets”, numbered from 1 to n, where each bucket will contain log n words. We are going to map virtual memory addresses to keys in the hash table, using the random oracle (F()) to compute our hash function. The hash function is defined as follows:
h(V)=  F(V) mod n. 
  •    Allocate a memory block of size n*O(logn), into n chunks of O(logn), where each chunk is called a bucket,      and each chunk  contains O(logn) words.
  •    Pick a random s, and define the hash function on v as : H(v, s)= F(s.v) mod n
  •    For i= 1 to n, store (vi, xi) into the first available word of H(vi, s) bucket.

This protocol can be achieved in time O(n*log n). We will skip the oblivious hash, if you want you can read it from the original paper. Here n items are stored into a hash table with n entries (with logn words each) according to a random oracle. Hence, the probability that any bucket will overflow is negligible. 



Any permutation (p((v1, x1) (v2, x2) …, (vn, xn)) ) of the elements will give a permutation p(H1, H2,.., Hn) of the hash values. The set of original hash values (h1, h2,…, hn) and the permuted hash values will be identically distributed. Our access pattern is a permutation of the pattern obtained here. After the pre-processing step, we can easily hide the access pattern, utilizing the assumption that every virtual memory location vi is referenced once and only once. To look for a vi we completely scan bucket h(vi). Since any two distinct memory locations are assigned to independent buckets based on the hash values and every virtual memory location is accessed at most once, the access pattern of the buckets is independent of the virtual memory access pattern.

HIERARCHICAL SOLUTION:

Now we allow multiple accesses to the same location and still hide the access pattern. Instead of using a single hash table, we use multiple hash tables of size 2^i, 0<i<N+1, and call them as buffers. The contents of buffers are “obliviously hashed” at different intervals, according to their sizes, where the smaller buffers are hashed more often than the bigger ones. Let t be the length of the input (program + data). We allocate memory for N (= 1+logt) distinct buffers where the ith buffer is of size 2^i buckets and each bucket has logt words. For ith buffer we associate a random si and associate a hash function H(v, si) with it (for different values of v).


ACCESS TECHNIQUE:

      For every virtual memory reference (or element searching), we scan the two buckets of level one and for all the other levels we scan a single bucket chosen by the hash function. Initially, we store all the program and data into the Nth buffer according to the hash function of the Nth level. When we need to access a virtual memory v, we first completely scan the level 1 buffer, looking for it. If not found we scan bucket h(v, s2) of level 2 and continue it for the levels, scanning one bucket for each, till we have found it. Suppose at level i we find the element, then we copy it to the worktape, and keep scanning random (obtained from the H(p, si), where p=index of search) memory locations for the N-i levels, till the last buffer. This is done in order to prevent the adversary from knowing from where we retrieved the data. After the whole scanning is done, we copy the data from the worktape to the first empty bucket in the first level.








Hierarchical Solution Diagram


After two access, the first level will become full, and then we need to move down the data to the second level, else for the next access the first level will overflow. We move data from the ith level to (i+1)th level after every 2^(i) retrievals. To move data, We obliviously hash  the contents of buffer level i and the existing contents of buffer level i+1 to the buffer i+1 according to H and a new randomly chosen index value s’(i+1).

OBLIVIOUSNESS:

As long as one particular memory reference is made only once, the choice of the hash-bucket accessed appears completely random to the adversary. We may access the same bucket twice for two different memory references but since the buckets are assigned independently on the basis of hash values, the choice the bucket appears random to the adversary. One important thing we need to observe is that for every memory reference,  the same bucket is not accessed twice as after each successful access it is moved up to smaller buffers and thus if the same element is again accessed then it will be present in a bucket in a smaller level. Even if the element comes back to the same level, we will find it in a different bucket, based on the hash functions. The probability of finding it in the same bucket as before is negligible.

COST:

For every access we scan the 2 buckets of first level, and one bucket each for subsequent levels. There are O(logt) words in each buckets. So there are O(logt + Nlogt) = O(logt +logt *logt) = O(logt*logt) accesses, where N=1+logt. Also we perform a sorting on the ith buffer after every 2^i-1 epochs, (i = 2, . . . n). Since the joint size of the level i 1 and level i buffers is O(2^i) buckets of size O(log t), it takes O(2^i*logt * log(2^i *logt)) steps to sort and obliviously hash them. But this is performed after every 2^(i-1) epochs. So the amortized cost for sorting the ith buffer is O(2*logt*(log(2^i*logt))= O(logt*(i+loglogt)). The total number of steps for t memory references come out to be O(t*(logt)^3). Thus we have an amortized poly-logarithmic (O((logt)^3)) overhead for each access. Previously it was O(√m). So this hierarchical solution reduces the number of accesses required for each memory reference.











No comments:

Post a Comment