E0 329 : Secure Computation

Wednesday, November 25, 2015

Non-Interactive Secure Computation based on Cut-and-Choose

Non-interactive Secure Computation (NISC) concerns protocols which have only a single round of interaction. We will discuss a NISC protocol which is based on the cut-and-choose paradigm of Lindell and Pinkas.

To quickly recap, the protocol presented by Lindell and Pinkas for two-party computation involves the construction of 3t Yao’s garbled circuits, of which some are selected for checking correctness of construction and the remaining are used for the computation. Their protocol had a cheating probability of 2^-t. The protocol we are going to discuss requires only t circuits to achieve the same cheating probability.

We begin with a brief overview of three flavours of NISC:

One-Sender NISC (OS-NISC): It involves two parties, one sender and one receiver. The receiver sends the first message and the sender replies with the second message. The receiver then outputs the result of the computation.
Multi-Sender NISC (MS-NISC): This is an extension where the first message can be used for running secure computation with many different senders. The receiver broadcasts its first message, and parties who wish to participate in the computation reply. The receiver then outputs the result of the computation with all the senders. This is a limitation of MS-NISC, since it requires the receiver has to aggregate and output all the results together.
Adaptive MS-NISC: The above limitation is removed and the receiver outputs the result of each computation as soon as the message is received.

We will discuss how this 2PC protocol modifies the cut-and-choose protocol as follows:

Step 1: Squashing interaction in cut-and-choose. This reduces the protocol to a NISC which achieves 2^-t cheating probability with 3t circuits.

Step 2: Reducing the number of garbled circuits from 3t to t.

Step 1: Squashing Cut-and-Choose 2PC to One Round

We begin with the most straightforward approach to cut-and-choose with 3t garbled circuits. The receiver’s first message corresponds to his OT queries for his input bits. The sender then garbles 3t circuits and replies with a message including (1) The 3t garbled circuits (2) The OT answers for the receiver’s query i.e. input-wire labels for the receiver’s input. Note that the OT protocol used has to be some 2-message OT. (3) Input-wire labels for the sender’s own input.

The receiver can now evaluate the circuits and take the majority result to be the output. However we have to answer the following questions of correctness and security. We have to come up with non-interactive solutions to these issues:

Were the garbled circuits constructed correctly?

The standard cut-and-choose requires the receiver to pick some fraction of the circuits for verification, and the sender to prove that these circuits were constructed correctly. To simulate this non-interactively, we add 3t OTs called circuit-OTs. In the i^th OT, the receiver queries for 1 if he chooses to check that circuit and 0 if he chooses to evaluate. The sender then inputs messages as follows: He picks a seed seed_i as input for a PRF, to generate all the randomness associated with the i^th garbled circuit. At the same time, instead of sending his input-wire labels in the clear, he now sends them encrypted with some key k_i. Then his input to the OT is (k_i , seed_i).

If the receiver chooses to check the circuit, he receives the seed used to generate all the randomness. If he chooses to evaluate, he receives the key which allows him to decrypt the sender’s input labels and evaluate. The encryption is necessary – if it was not used, the receiver would be able to generate all the input labels for any check circuit, and compare with the received labels from the sender (which are in the clear), and hence leaking the sender’s input.

Did the sender use the right input-wire labels in the first set of OTs?

To ensure this, we use an OT protocol in which the second message commits the sender to specific inputs. The randomness used to generate these inputs must also be derived from seed_i. In this case, for a check circuit the receiver can compute the required second message of the OT and verify.

Was the sender’s input consistent in all 3t circuits?

We use a commitment scheme which supports efficient proof that two commitments are commitments to the same value. Then the sender is required to use commitments to 0 and commitments to 1 as the input-wire labels. Again, we require that the randomness used in the commitments is generated using seed_i. In this case the commitments can be generated by the receiver to check the input-wire labels.

The protocol also includes input commitments, which are a set of commitments of the actual input bits; and commitment equality proofs, which prove that each input value in the evaluated circuit is equal to the value committed in the corresponding input commitments. These proofs are encrypted using k_i in order to hide them from the receiver in the check circuits.

Step 2: Reducing the number of garbled circuits

To reduce the number of garbled circuits in the above protocol, we use only t circuits and instead of checking a fraction of them, we check a random subset. If all evaluated circuits output the same bit, then the probability of cheating is 2^-t since the sender has to guess the exact circuits chosen for checking and evaluation. However if some of the evaluated circuits output different bits, it implies that the sender is malicious. In this case, we want the receiver to be able to recover the sender’s input and compute the output itself.

For simplicity we assume that the circuit the players use has only one output wire and that the sender has only one input bit. Then a high level overview is as follows:

As we discussed before, the sender’s input wire labels are commitments to the actual values. Say we use El Gamal commitment in a group G with generator g where DDH is hard. Then let EGCommit(h; b, r) = (g^r, h^rg^b).

Then in our protocol, the sender picks w, sends h=g^w to the receiver and sets the labels of the input wires in the i^th garbled circuit to be EGCommit(h; 0, r_i,0) and (h; 1, r_i,1). Next the sender picks at random w₀, w₁ such that w = w₀ + w₁ and sends h₀ = g^w0 and h₁ = g^w1. Additionally for the i^th circuit, he sends output recovery commitments h₀g^li,0and h₁g^li,1, where l_i,0 and l_i,1 are chosen at random. Then the output wire labels of the circuit are set as l_i,0 and l_i,1 corresponding to 0 and 1 respectively.

In the cut-and-choose stage:

If the receiver chooses to check the circuit it receives seed_i and recovers the output wire labels. It can then verify the output recovery commitments.
If he chooses to evaluate, he additionally receives w₀ + l_i,0 and w₁ + l_i,1 (from the sender, encrypted under k_i). The receiver then verifies the consistency of these values with the output recovery commitments by raising g to these values. If they turn out inconsistent, he aborts. Else, he also checks that the earlier commitment of the computed output l_i,bwas valid. If so, he marks the circuit as semi-trusted.

At the end of the evaluation, the receiver is left with either the same output from all semi-trusted circuits, or two outputs from at least two semi-trusted circuits.

Case 1: Same output from all – Since with probability 2^-t at least one good evaluated circuit exists (to manipulate, the sender will have to guess the exact evaluation set), this implies that the output is correct.

Case 2: Two circuits with different output – The receiver initiates the cheating recovery process. Say one of the semi-trusted circuits outputs 0 and the other outputs 1. From the first circuit the receiver gets l_i,0 and from the sender’s previous message, w₀ + l_i,0. Hence he can recover w₀. Similarly from the second circuit he can recover w₁. Given w = w₀ + w₁, he can decrypt the input-commitments and recover the sender’s input. Note that if the sender was honest, all the circuits would give the same output and the receiver would only be able to compute w₀ or w₁ and not both. For more than one output wire, different w₀ w₁ are chosen for each, thus the receiver learns one value from each pair.

This completes our description of non-interactive secure computation based on the cut-and-choose paradigm.

REFERENCE

Arash Afshar, Payman Mohassel, Benny Pinkas, Ben Riva. Non-Interactive Secure Computation Based on Cut-and-Choose. EUROCRYPT 2014

Saturday, November 21, 2015

Efficient Constant Round Multi-Party Computation Combining BMR and SPDZ

In this blog, we see an efficient constant round MPC that is fully secure in the presence of malicious adversaries and in the dishonest majority (t < n) case. The crux of the construction lies in combining the protocol of BMR with an efficient protocol for evaluating arithmetic circuits which is secure in the presence of malicious adversaries. Before going into the details of the construction, let's analyse the components separately.

The BMR protocol of Beaver et al. is a variant of Yao's protocol that runs in a multi-party setting in dishonest majority setting with security in the presence of semi-honest adversaries. This protocol demands all parties to jointly constructing a garbled circuit, with the help of a constant round MPC and then computing it. For achieving efficiency, BMR uses offline/online paradigm, where garbled circuit construction happens in offline phase and with the exchange of values corresponding to their inputs, parties can evaluate the circuit in the online phase with less amount of communication. However, in the case of malicious adversaries, security is guaranteed only in the case of honest majority (t< n/2) case (For more information, read here).

In order to facilitate our required constant round MPC, the BMR protocol is modified so that the garbling and evaluation operations are done over a finite field. In detail, instead of using XOR of bit strings, and hence a binary circuit to instantiate the garbled gate, additions of elements in a finite field are used, and hence an arithmetic circuit. This enables us to use an efficient non-constant round protocol - with security for malicious adversaries - to compute the gate tables of the BMR garbled circuit (and since the computation of these tables is of constant depth, this step is constant round).

In the general construction, the new constant-round MPC consists of two phases. In the first (offline) phase, the parties securely compute random shares of the BMR garbled circuit. To achieve efficiency, instead of doing naively, each party locally computes the pseudorandom function as needed for every gate and uses the results as input to the secure computation. From the proof of security, we can see that if a party cheats and inputs incorrect values then no harm is done, since it can only cause the honest parties to abort. Next, in the online phase, all that the parties need to do is reconstruct the single garbled circuit, exchange garbled values on the input wires and locally compute the garbled circuit. (For a detailed look, click here)

Friday, November 20, 2015

Fast Cut-and-Choose Based Protocol for Malicious Adversary - Protocol

In the previous blog we saw the high level description of the fast cut-and-choose protocol. We will now see how the protocol works

Inputs: P₁ has input x ∈ {0,1}^l and P₂ has input y ∈ {0,1}^l

Specified output: Party P₂receives f(x,y) and part P₁ receives no output. Length of output is m.

1) Input Key Choice and Circuit Preparation

a) P₁ picks random values a₁⁰, a₁¹, …, a_l⁰, a_l¹; r₁,…,r_s∈_R Z_q and b₁⁰, b₁¹, … , b_m⁰, b_m¹ ∈_R {0,1}ⁿ.

b) For each garbled circuit j, P₁ sets the keys for its input wires to:

k_i,j⁰ = H(g^a_i^0*r_j) and k_i,j¹ = H(g^a_i^1*r_j)

c) The keys for output wire i are b_i⁰ and b_i¹. This is same for all garbled circuits.

d) P₁ constructs GC₁, GC₂,… GC_s - s independent copies of circuit C. The output wire keys in all are the same.

2) Oblivious Transfers:

a) P₁chooses random values x₁,…x_s ∈_R {0,1}ⁿ.

b) P₂ chooses a random subset J ⊂ [s] where every j ∈ J with probability exactly ½. P₂ sends the set J and his input bits as part of the OT. J is the set of circuits P₂ is going to check.

c) P₂ receives all keys associated with its input wires in all circuits GC_j for j ∈ J. Also receives the keys associated with its input y in all other circuits. Also receives the keys associated with its input y in all other circuits.

d) P₂ receives x₁,…x_s ∈_R {0,1}ⁿ for every j ∉ J. This acts as proof that P₂ chose to evaluate these circuits.

3) Send Circuits and Commitments

a) P₁ sends P₂ the following

i) The garbled circuits.

ii) Seed of the randomness extractor.

iii) Commitment to the garbled values associated with P₁’s input.

iv) Output translation tables.

4) Send Cut-And-Choose Challenge

a) P₂ sends P₁ the set J along with x_j for every j ∉ J.

b) If the values received by P₁ are incorrect, output ⊥ and abort.

5) P₁ Sends Its Garbled Input Values in the Evaluation-Circuits

a) P₁ sends the keys associated with its inputs in the evaluation circuits.

6) Circuit Evaluation

a) P₂ uses P₁’s key from Step 5 and the keys associated with its own input from Step 2c to evaluate all garbled circuits.

b) If P₂ receives consistent output across all the garbled circuit and next step doesn’t result in an abort, then this will be the output.

c) Else if P₂ receives two valid outputs on one output wire (i.e. both b_i⁰ and b_i¹), then it uses these in the next step.

d) Else it will abort in Step 8b.

7) Run Secure Computation To Detect Cheating

a) P₁ defines a circuit with the output wire labels b₁⁰, b₁¹, … , b_m⁰, b_m¹ hardcoded. The circuit computes the following function

i) P₁’s input is the same input x as from before and has no output

ii) P₂’s input is a pair of values b0, b1.

iii) If there exists a hardcoded pair of label such that bi0 = b0 and bi1 = b1, then P₂ receives x. Otherwise it receives no output

b) The above circuit can be evaluated using any existing secure computation protocol that gives 2^-s security.

8) Check Circuits for Computing f(x,y):

a) Send all input garbled values in check-circuits: For every check-circuit, the party P₁ sends the randomness used for the commitment in Step 3. Aborts if there is inconsistency.

b) Correctness of Check Circuits: For every j ∈ J, P₂ uses the commitment it received in Step 3 and the randomness in previous step to compute both pairs of input keys associated with P₁’s input in GCj

c) Using these, P₂ decrypts the circuit and verifies that it is a garbled version of C. If there exists a circuit for which it fails, then P₂ aborts.

d) If there is exists an output wire for which P₂ did not receive a valid value in any evaluation circuit in Step 6, then P₂ aborts.

9) Verify Consistency of P₁’s Input:

a) For every input wire P₁ proves using zero-knowledge proof of knowledge, the correctness of its input.

b) If any of the proofs fail, then P₂ aborts.

10) Output Evaluation:

a) If P₂ received no inconsistent outputs from the evaluation circuits, then it decodes the outputs using the encoded translation tables and outputs the string received.

b) If P₂ received inconsistent output, then let x be the output that P₂ received from second computation. Then P₂ computes f(x,y) and outputs it.

Reference: Blazing Fast 2PC in the Offline/Online Setting with Security for Malicious Adversaries Yehuda Lindell and Ben Riva

Multivalued Byzantine Broadcast : the t < n case

Byzantine broadcast is a distributed primitive that allows a specific party to consistently distribute a message among “n” parties in the presence of potential corrupt parties of up to t of parties. All known protocols implementing broadcast of an l-bit message from point – to – point channels tolerating any t<n byzantine corruptions have communication complexity at least Ω (ln²).

The presentation showed cryptographically secure and information – theoretically secure protocol for t < n that communicate O(ln) bits when l is sufficiently large. This is optimal for any protocol for broadcasting l-bit message.

Model of the Protocol

The protocol consists of n parties (players) P = {p₁………p_n} with some designated party called the sender which is denoted as p_s for some s ∈ {1……….n}. The parties are connected with a synchronous authentic point – to – point network. A broadcast protocol allows the sender p_s to distribute a value v_s among parties P such that the following conditions satisfy.

Termination: every correct party p_i ∈ P terminates.

Consistency: all correct parties in P decide on the same value.

Validity: if the sender p_s is correct, then every correct party p_i ∈ P decides on the value proposed by the sender v_i = v_s

Cryptographic secure block broadcast protocol

The protocol employs a collision resistant hash function CRHash. In the beginning of the protocol the sender broadcasts a hash h = CRhash (v_s) of the value it holds. The goal of the protocol is to ensure that all correct players learn v_s. All parties during the protocol run are divided in to two sets H and H^`. The set H consists of happy players who have already learned v_s and H` who have not. At each round we try to move a player from H^`to H. We select a pair of players p_x, p_y such that p_x ∈ H and p_y ∈ H^`. Then p_x sends the value it holds to p_y. Once p_y receives a value from p_x it verifies if its hash is h. In the positive case p_y is included in H and in the negative case the pair (p_x,p_y) is put in disputed set D. Hence in each round, we either include one player in H or conflicted players in D.

Validity: A correct p_x ∈ H holds v_x with CRHash (v_x)=h and sends v_xto p_ywho successfully verifies CRHash (v_x)=h. If p_y finds the hash of v_x is h, broadcasts 1 else it broadcasts 0. Hence if both p_xandp_yare honest, p_ywill be added to H and (p_x, p_y) is not added to D . If p_sis correct then all honest players will be in H.

Termination: At each iteration of the algorithm either H grows or D grows. H is limited by n and D is limited by n². Hence number of iterations are limited.

Consistency: In all iterations all correct parties send correct v value and all correct parties verify h = CRHash(v) . Hence if atleast one honest party is in H, all correct parties will end up in H set and no honest party is left in H^`. Otherwise none of the correct parties will be in H and all correct parties output _┴. A situation where all the correct parties output _┴ is also consistent.

Complexity: At each iteration of algorithm, either H grows or D grows. Hence the number of iterations is upper bounded by O(n+d). Note that d is the number of pairs who have found dispute. This can be O(n²) at maximum if there are n parties. The total communication cost of the protocol is upper bounded by B(|h|) + (n+d)(|v_s| +B(1)), where B(a) is the number of bits communicated using the broadcast protocol for single valued messages to broadcast a.

Constructing Broadcast for long messages

In the above described protocol if we transmit complete message at once , that is if |v_s| =l , then the worst case complexity will be O(ln²) and we end up no better than any other previous broadcast protocol . But if we use the above protocol as a sub-protocol to broadcast a long message of length l bit then the complexity reduces. The key idea is that the set D holding dispute pairs is global for each iteration of this sub-protocol. The protocol will be invoked q times where each invocation transmits |l/q| length of message. Hence the communication complexity will be

At the beginning of the protocol D is initialized to null set. The d set keeps growing if there are disputes in each iterations. The state of D in previous iteration is carried to next iteration where each iteration broadcasts l/q length of message. Hence the size of D is sum of all d_is where d_i denotes the newly added disputes in i^th iteration of block transmit. D can be O(n²) at maximum.

It is evident that above algorithm is optimal when q=n. That is we get O(ln).

Universal Hash Functions

Consider a family of functions U = {u_k}_k_∈_Kinduced with a key set K , where each functionu_kmaps elements of some set X to a fixed set of bins Y. The family U is called ∈-universal if for any two distinct messages v₁and v₂

Information theoretically secure broadcast protocol

Similar to the cryptographic case, all parties during the protocol are divided into two sets H and H`. The difference to the cryptographic case is that the set H is not monotonically increasing. It may happen that the some players may be added / removed from H several times. At each iteration of the algorithm we try to move a player from H` to H. We select a pair of players p_x, p_y such that p_x∈H, p_y∈H` such that {p_x, p_y} !∈ D. Then p_x sends the value it holds to p_y. Now player p_y needs to verify that the value received from p_x is the value that correct parties in H hold. In order to do so, p_y broadcasts a random key k from ∈ - universal hash function ITHash and then p_s broadcast a hash h for this key. As long as p_y honestly chooses k uniformly at random with overwhelming probability correct players will obtain different hashes if they hold different values. If a party in H U{p_y} / {p_s} holds a value with a hash h, then the broadcast 1 and 0 otherwise. If every party broadcast 1 then the iteration was successful and p_y is added to H. otherwise some of the parties in H ∪ {p_y} do not hold the right values and we search for disputes.

In this protocol the dispute can arise between any two parties in H. In order to find such disputes one must be able to reason about history of how H was formed. Thus a history set T will contain pairs of players (p_x,p_y) such that p_y learned the value it holds from p_x. Whenever a party p_j broadcasts zero, it means that the hash value generated by the sender and party p_j is not matching . In such a case we refer to the history set to see the predecessors of p_j from whom message v, which is under dispute has reached to p_j . Among them we check for the parties at which the transition of broadcast bit has happened from 1 to 0. We add such parties as dispute pairs in set D. If a dispute occur, then we will set H={P_s} and T=ϕ.

Time complexity

There can be at most n consecutive iterations, where no conflict is found. Hence number or rounds is at maximum O(n+nd). Hence communication complexity is O((n+nd) (|v_s|) + B (|h|) + B(|k|) + nB(1))).

Communication complexity of Long Messages

We get O(ln) if we set q=n²

Fast Cut-and-Choose Based Protocol for Malicious Adversary

Yao’s garbled circuit is a well-known technique for secure two-party computation. While it is very efficient in the semi-honest model; in case of malicious adversary, the major challenge is to ensure that the constructed garbled circuit corresponds to the correct function. The cut-and-choose technique of opening and checking s/2 circuits and using the rest as evaluation circuits achieves a cheating probability of 2 ^-0.32swhere s is the number of garbled circuits. To achieve an error probability of 2^-40, around 128 garbled circuits need to be constructed which adds heavy communication overhead. In this talk, we shall see a more efficient cut-and-choose technique which achieves a cheating probability of 2^-s; thereby 40 circuits suffice to achieve the desired error probability of 2^-40.

The main idea of this protocol is that the party P₁ who constructs the circuit can cheat only if all the check circuits are correct and all the evaluation circuits are incorrect. The party which evaluates, checks each circuit with a probability of ½, independently of all other circuits. Thus we achieve the cheating probability of 2^-s. This is unlike the previous case where only majority of the evaluation circuits needed to be correct. It is important to note why the criteria is to check only if majority of circuits have same output rather than all. Suppose the criteria was for P₂ to abort in case it doesn’t receive the same output in all circuits, then it is susceptible to the following attack – A malicious P₁ can construct a single incorrect circuit that computes this function: If the first bit of P₂’s input is 0, then output random garbage else compute the correct function. Now if this circuit is an evaluation circuit (which happens with probability ½) and P₂’s first input bit is 0, then P₂ will receive a different output in this circuit and different in the others. He will abort in this case since the output of all evaluation circuits is not consistent. In contrast, if the first input bit of P₂ is 1, then he receives the same output in all and doesn’t abort. Based on whether P₂ aborts or not, P₁ can learn the first bit of P₂’s input. To overcome this attack, the criteria is for P₂ to take majority value as output. In the new cut-and-choose technique, the output of all the evaluation circuits need to be consistent. To enable this, an additional small secure computation is run after the cut-and-choose phase so that if P₂ catches P₁ cheating (namely, if P₂ receives inconsistent outputs in evaluation circuits) then in the second secure computation it learns P₁'s full input x. This enables P₂ to locally compute the correct output once again. Thus, it is no longer necessary for P₂ to take the majority output. The protocol proceeds in two phases as outlined below-

Phase 1 - Cut-and-Choose

- Parties P₁ (with input x) and P₂ (with input y) run the usual protocol based on cut-and-choose of garbled circuit secure for malicious adversary. P₁ constructs s garbled circuits and each circuit is independently chosen as a check or evaluation circuit with probability ½.

- If all the circuits evaluated by P₂ give the same output z, then P₂ stores z locally. Otherwise P₂ stores a “proof” that it received two inconsistent output values in two different circuits. Such a proof could be having a garbled value associated with 0 on an output wire in one circuit, and a garbled value associated with 1 on the same output wire in a different circuit. This is a valid proof since if P₂ obtains a single consistent output, then the garbled values it receives on an output wire in different circuits are all associated with the same bit.

Phase 2 – Secure Evaluation of Cheating

- P₁ and P₂ run a protocol that is secure for malicious adversary with error 2^-s(using the previous cut-and-choose protocol with 3s circuits).

- P₁’s input is the same input x used in phase 1. P₁ proves that he uses the same x. For this, we use the same mechanism that is used in known protocols to ensure that the same input is used for all the garbled circuits.

- If P₂ had received the consistent output z from all evaluation circuits, he inputs some random value; else his input comprises of the “proof of inconsistent output values”.

- If P₂’s input is a valid proof of inconsistent output values, then P₂ receives P₁’s input x, else he receives nothing.

Note: In order to make circuit for Phase 2 small for better efficiency, it is necessary to construct all of the output wires in all circuits of Phase 1 so that they have the same garbled values on all output wires. Opening a circuit to check it, reveals both values on an output wire which means that this knowledge can no longer be used as a proof that P₁ cheated. Therefore, it is necessary to open and check circuits only after Phase 2.

Output Determination

If P₂ received a single output z in phase 1, then P₂outputs z and halts. Otherwise if P₂ had received inconsistent outputs in phase 1, then it obtains the value of P₁’s input x in phase 2. Using this input x, P₂ can locally compute the function output z = f(x, y).

P₁ is completely clueless regarding whether z was obtained in phase 1 or locally computed. This prevents the earlier attack that we discussed where P₁ could get some information about P₂’s input by tweaking a single circuit that computes function dependent on P₂’s input. The main problem in that case was that based on whether P₂ aborts or not, P₁ could conclude about P₂’s input. In this case P₁ receives the correct output z in both cases and has no indication regarding whether the output was obtained from Phase 1 or 2. Thus the earlier attack will not work in this case.

Analysis of the protocol

P₁ corrupt : Suppose P₁ is corrupt and does not construct all the garbled circuits correctly. The only scenario in which P₂ will get the same wrong output for all the evaluation circuits in Phase 1 is when each check circuit is correct and each evaluation circuit is wrong. This can happen only with probability of 2^-s . If all the evaluation circuits are correct, P₂ will receive correct output in Phase 1 itself. Now let us analyze the case where there are two different circuits with two different outputs. P₂ will obtain a valid proof of cheating and use that to learn x in phase 2, thereby enabling it to locally compute the correct output.

P₂ corrupt : The way P₂ can cheat is by trying to obtain the input of P₁ in phase 2 even though he obtained the correct output in phase 1. However, this is not feasible since all the evaluation circuits would have been consistent (as P₁ is honest) and thus P₂ will be unable to formulate a valid proof of inconsistency that can be used in Phase 2.

This modified cut-and-choose protocol reduces the number of garbled circuits to be communicated in the maliciously secure Yao’s protocol. It achieves a cheating probability of 2^-swith s garbled circuits.

Thursday, November 19, 2015

Software Protection and Simulation on Hierarchial Oblivious RAM

In Divya and Nithin's blog we saw oblivious RAM which will help the user to hide access patterns made during a program execution. The adversary won’t be able to guess the underlying algorithm running for the program, just by seeing the way the memory is accessed during the execution. This is performed by using oblivious RAM. It is oblivious in the sense that for any two inputs to a particular, the access patterns are indistinguishable for same running time. We saw the square root algorithm, where the time complexity for one access was (2√m + log (m+√m)) = O(√m) (where m is the size of RAM). A hierarchical access technique will decrease this complexity, which is discussed in this blog.

SIMPLE VERSION:

Before going into the actual hierarchical solution, we discuss a simpler solution where we consider every memory location in the RAM is a contiguous block and that each location is going to be read only once and we need to hide the order in which they are accessed.

We have n elements as ((v1, x1) (v2, x2) …, (vn, xn)). ith element can be defined as (vi, xi), where vi is the memory location and xi is the data contained in that memory location.

We create a hash-table with n “buckets”, numbered from 1 to n, where each bucket will contain log n words. We are going to map virtual memory addresses to keys in the hash table, using the random oracle (F()) to compute our hash function. The hash function is defined as follows:

h(V)= F(V) mod n.

Allocate a memory block of size n*O(logn), into n chunks of O(logn), where each chunk is called a bucket, and each chunk contains O(logn) words.
Pick a random s, and define the hash function on v as : H(v, s)= F(s.v) mod n
For i= 1 to n, store (vi, xi) into the first available word of H(vi, s) bucket.

This protocol can be achieved in time O(n*log n). We will skip the oblivious hash, if you want you can read it from the original paper. Here n items are stored into a hash table with n entries (with logn words each) according to a random oracle. Hence, the probability that any bucket will overflow is negligible.

Any permutation (p((v1, x1) (v2, x2) …, (vn, xn)) ) of the elements will give a permutation p(H1, H2,.., Hn) of the hash values. The set of original hash values (h1, h2,…, hn) and the permuted hash values will be identically distributed. Our access pattern is a permutation of the pattern obtained here. After the pre-processing step, we can easily hide the access pattern, utilizing the assumption that every virtual memory location vi is referenced once and only once. To look for a vi we completely scan bucket h(vi). Since any two distinct memory locations are assigned to independent buckets based on the hash values and every virtual memory location is accessed at most once, the access pattern of the buckets is independent of the virtual memory access pattern.

HIERARCHICAL SOLUTION:

Now we allow multiple accesses to the same location and still hide the access pattern. Instead of using a single hash table, we use multiple hash tables of size 2^i, 0<i<N+1, and call them as buffers. The contents of buffers are “obliviously hashed” at different intervals, according to their sizes, where the smaller buffers are hashed more often than the bigger ones. Let t be the length of the input (program + data). We allocate memory for N (= 1+logt) distinct buffers where the ith buffer is of size 2^i buckets and each bucket has logt words. For ith buffer we associate a random si and associate a hash function H(v, si) with it (for different values of v).

ACCESS TECHNIQUE:

For every virtual memory reference (or element searching), we scan the two buckets of level one and for all the other levels we scan a single bucket chosen by the hash function. Initially, we store all the program and data into the Nth buffer according to the hash function of the Nth level. When we need to access a virtual memory v, we first completely scan the level 1 buffer, looking for it. If not found we scan bucket h(v, s2) of level 2 and continue it for the levels, scanning one bucket for each, till we have found it. Suppose at level i we find the element, then we copy it to the worktape, and keep scanning random (obtained from the H(p, si), where p=index of search) memory locations for the N-i levels, till the last buffer. This is done in order to prevent the adversary from knowing from where we retrieved the data. After the whole scanning is done, we copy the data from the worktape to the first empty bucket in the first level.

Hierarchical Solution Diagram

After two access, the first level will become full, and then we need to move down the data to the second level, else for the next access the first level will overflow. We move data from the ith level to (i+1)th level after every 2^(i) retrievals. To move data, We obliviously hash the contents of buffer level i and the existing contents of buffer level i+1 to the buffer i+1 according to H and a new randomly chosen index value s’(i+1).

OBLIVIOUSNESS:

As long as one particular memory reference is made only once, the choice of the hash-bucket accessed appears completely random to the adversary. We may access the same bucket twice for two different memory references but since the buckets are assigned independently on the basis of hash values, the choice the bucket appears random to the adversary. One important thing we need to observe is that for every memory reference, the same bucket is not accessed twice as after each successful access it is moved up to smaller buffers and thus if the same element is again accessed then it will be present in a bucket in a smaller level. Even if the element comes back to the same level, we will find it in a different bucket, based on the hash functions. The probability of finding it in the same bucket as before is negligible.

COST:

For every access we scan the 2 buckets of first level, and one bucket each for subsequent levels. There are O(logt) words in each buckets. So there are O(logt + Nlogt) = O(logt +logt *logt) = O(logt*logt) accesses, where N=1+logt. Also we perform a sorting on the ith buffer after every 2^i-1 epochs, (i = 2, . . . n). Since the joint size of the level i − 1 and level i buffers is O(2^i) buckets of size O(log t), it takes O(2^i*logt * log(2^i *logt)) steps to sort and obliviously hash them. But this is performed after every 2^(i-1) epochs. So the amortized cost for sorting the ith buffer is O(2*logt*(log(2^i*logt))= O(logt*(i+loglogt)). The total number of steps for t memory references come out to be O(t*(logt)^3). Thus we have an amortized poly-logarithmic (O((logt)^3)) overhead for each access. Previously it was O(√m). So this hierarchical solution reduces the number of accesses required for each memory reference.