In Divya and Nithin's blog we saw oblivious RAM which will help the user to hide access patterns made
during a program execution. The
adversary won’t be able to guess the underlying algorithm running for the
program, just by seeing the way the memory is accessed during the execution.
This is performed by using oblivious RAM. It is oblivious in the sense that for
any two inputs to a particular, the access patterns are indistinguishable for
same running time. We saw the square root algorithm, where the time complexity
for one access was (2√m + log
(m+√m)) = O(√m) (where m is the size of RAM). A hierarchical access technique
will decrease this complexity, which is discussed in this blog.
SIMPLE
VERSION:
Before
going into the actual hierarchical solution, we discuss a simpler solution where
we consider every memory location in the RAM is a contiguous block
and that each location is going to be read only
once and we need to hide the order in which they are accessed.
We
have n elements as ((v1, x1) (v2, x2) …, (vn, xn)). ith element can be
defined as (vi, xi), where vi is the
memory location and xi is the data contained in that memory location.
We create a hash-table with n “buckets”, numbered from 1 to n, where
each bucket will contain log n words. We are going to map virtual memory
addresses to keys in the hash table, using the random oracle (F()) to compute
our hash function. The hash function is defined as follows:
h(V)= F(V) mod n.
- Allocate a memory block of size n*O(logn), into n chunks of O(logn), where each chunk is called a bucket, and each chunk contains O(logn) words.
- Pick a random s, and define the hash function on v as : H(v, s)= F(s.v) mod n
- For i= 1 to n, store (vi, xi) into the first available word of H(vi, s) bucket.
This protocol can be achieved in time O(n*log
n). We will skip the oblivious hash, if you want you can read it from the original paper. Here n items are stored into a
hash table with n entries (with logn words each) according to a random oracle. Hence, the probability
that any bucket will overflow is negligible.
Any permutation
(p((v1, x1) (v2, x2) …, (vn, xn)) ) of the elements will give a permutation p(H1,
H2,.., Hn) of the hash values. The set of original hash values (h1, h2,…, hn)
and the permuted hash values will be identically distributed. Our access pattern is a permutation of the pattern obtained here. After
the pre-processing step, we can easily hide the access pattern, utilizing the
assumption that every virtual memory location vi is
referenced once and only once. To look for a vi we completely
scan bucket h(vi). Since any two distinct memory locations are
assigned to independent buckets based on the hash values and every virtual
memory location is accessed at most once, the access pattern of the buckets is
independent of the virtual memory access pattern.
HIERARCHICAL SOLUTION:
Now we allow multiple accesses to the same location and still hide the
access pattern. Instead of using a single hash table, we use multiple hash
tables of size 2^i, 0<i<N+1, and call them as buffers. The contents of
buffers are “obliviously hashed” at different intervals, according to their
sizes, where the smaller buffers are hashed more often than the bigger ones.
Let t be the length of the input (program + data). We allocate memory for N (=
1+logt) distinct buffers where the ith buffer is of size 2^i buckets and each
bucket has logt words. For ith buffer we associate a random si and associate a
hash function H(v, si) with it (for different values of v).
ACCESS TECHNIQUE:
For every virtual
memory reference (or element searching), we scan the two buckets of level one
and for all the other levels we scan a single bucket chosen by the hash function. Initially, we store all the program and
data into the Nth buffer according to the hash function of the Nth level. When
we need to access a virtual memory v, we first
completely scan the level 1 buffer, looking for it. If not found we scan bucket
h(v, s2) of level 2 and continue it for the levels, scanning
one bucket for each, till we have found it. Suppose at level i we find the
element, then we copy it to the worktape, and keep scanning random (obtained
from the H(p, si), where p=index of search) memory locations for the N-i
levels, till the last buffer. This is done in order to prevent the adversary
from knowing from where we retrieved the data. After the whole scanning is done,
we copy the data from the worktape to the first empty bucket in the first
level.
Hierarchical Solution Diagram
After two access, the first level will become full, and then we need to move
down the data to the second level, else for the next access the first level
will overflow. We move data from the ith level to (i+1)th level after every
2^(i) retrievals. To move data, We obliviously hash the contents of buffer level i and the
existing contents of buffer level i+1 to the buffer i+1 according to H and a new
randomly chosen index value s’(i+1).
OBLIVIOUSNESS:
As long as one particular memory
reference is made only once, the choice of
the hash-bucket accessed appears completely random to the adversary. We may
access the same bucket twice for two different memory references but since the
buckets are assigned independently on the basis of hash values, the choice the
bucket appears random to the adversary. One important thing we need to observe
is that for every memory reference, the
same bucket is not accessed twice as after each successful access it is moved
up to smaller buffers and thus if the same element is again accessed then it
will be present in a bucket in a smaller level. Even if the element comes back
to the same level, we will find it in a different bucket, based on the hash
functions. The probability of finding it in the same bucket as before is
negligible.
COST:
For every access we scan the 2 buckets of first level, and one bucket
each for subsequent levels. There are O(logt) words in each buckets. So there
are O(logt + Nlogt) = O(logt +logt *logt) = O(logt*logt) accesses, where N=1+logt.
Also we perform a sorting on the ith buffer after every 2^i-1 epochs, (i = 2, . . . n). Since the joint size of the level i − 1 and level i buffers is O(2^i) buckets of size O(log
t), it takes O(2^i*logt * log(2^i *logt)) steps to sort and obliviously hash
them. But this is performed after every 2^(i-1) epochs. So the amortized cost
for sorting the ith buffer is O(2*logt*(log(2^i*logt))= O(logt*(i+loglogt)). The total number of
steps for t memory references come out to be O(t*(logt)^3). Thus we have an
amortized poly-logarithmic (O((logt)^3)) overhead for each access. Previously it was O(√m). So this hierarchical
solution reduces the number of accesses required for each memory reference.

No comments:
Post a Comment