Hi all,
We are a university research group and following is a proposal for possible
optimization on Tor's rendezvous circuit design. It's not written as a formal
Tor proposal yet because I want to have a discussion with the community first,
get some feedbacks and make sure it does not introduce new vulnerabilities.
So please feel free to comment if you have any thoughts or find any holes :)
PS: a txt version of the proposal is attached in this mail as well.
------------------------------------------------------
The changes described in this document is focused on speeding up hidden service
while at least keeping the same level of anonymity as current version provides.
Naming conventions:
HSv3:
current version of hidden service protocol
HS5h:
hidden service protocol that uses 5 hops
HS:
hidden service
client:
The onion proxy that tries to communicate with HS
HSDir:
hidden service directory
RP:
rendezvous point
1. The 5 hops rendezvous circuit
The setup on HS side is the same as HSv3. HS picks couple introduction
points and uploads the descriptor to HSDir. The main difference is how to
pick the RP. In HS5h, Instead of letting the client pick whatever node it
wants, both client and HS use negotiation protocol to pick a RP
collaboratively.
Following is a high level description for the rendezvous circuit setup
process in HS5h:
(0) client fetches descriptor for a hidden service.
(1) client connects to introduction point.
(2) since client and HS are connected via introduction point, they can
negotiate a random number using this channel.
(For more details, see [RAND_NEGO])
(3) both client and HS maps that random number to a random onion router
using the same scheme, so they end up with the same node.
This is the candidate RP.
(4) both client and HS create a 3 hops circuit using RP as last hop.
(5) RP joins the circuit originates from HS to the circuit originates
from client.
(6) now client and HS are connected. Because their original circuits
share the same endpoint(the RP), the length of the path is 5 hops.
Improvement:
(1) Reduction in circuit setup time.
Both the client and HS only build 3 hops circuit now. While In
current protocol(HSv3), HS needs to build a 4 hops circuit.
(2) A shorter circuit means less transmission delay. It uses only one
more hop compared to HSv3's Tor2Web mode.
Also the circuit length for HS5h's Tor2Web mode can be further
reduced to 3 hops.
(3) Less setup time on client side for the first connection.
In HSv3, a client has to first setup a 3 hops circuit to RP before
sending INTRODUCTION1 cell to the introduction point. While in
HS5h, this process is removed.
Possible drawback:
(1) An extra step is needed: random number negotiation.
This result ins more communication between two nodes and
Introduction point needs to remember states for the negotiation.
We think this negotiation step will not introduce a noticeable delay
to the whole process. Firstly, it reuses the connection to introduction
point for both sides so it requires no extra circuits build up.
Secondly, the bottle neck is circuit setup, cell/stream transmission
delay is actually pretty low.
2. Negotiate random numbers between two nodes [RAND_NEGO]
H(a) here means digest of message a.
(1) client generates a random number R_a and sends the digest H(R_a) to HS
(2) HS remembers H(R_a), generates a random number R_h and sends it back to
the client
(3) client receives R_h and sends back R_a to HS
(4) HS checks R_a against H(R_a) to make sure it's not generated after
client receives R_h.
(5) Now both client and HS compute a shared random number using R_a
and R_c. (for instance, simply add up these two number)
The negotiation protocol can be very simple here because only two parties
are involved and the random number does not need to be remained secret.
Note that at step 2), if HS is able to recover R_a from H(R_a), it can take
control over R_c. So to mitigate this, we can use a variant of
Diffie-Hellman handshake:
(1) client generates a public key g^x and sends the digest H(g^x) to HS
(2) HS remembers H(g^x), generates a public key g^y and sends it back
to the client
(3) client receives g^y and sends back g^x to HS
(4) HS checks g^x against H(g^x) to make sure it's not generated after
client receives g^y.
(5) Now both client and HS compute a shared random number: R_c = g^(xy)
3. Rough anonymity analysis for HSv3 and HS5h
3.1. Assumptions, attack rules and constraints
In this section, we assume 3 hops being the minimal circuit length to keep
the origin node of a circuit anonymous from the destination.
Two simple attack rules are used to remove nodes from a circuit:
1. If node A is picked by the attacker, then he can pick himself as node A.
2. If the identity of node A is known by the attacker, then he can connect
directly to it.
To defense against these attack rules, following constraints need to be
enforced for a circuit:
1. No one except the circuit originator can have total control over choice
of all nodes on the circuit. Note that it's OK even if the originator
does not have total control.
2. Identity for all nodes on the circuit, except the last hop, must
remain secret. i.e. only known by circuit originator.
3.2. Why 6 hops is the minimal circuit length for HSv3
First let's begin with an ASCII graph for HSv3 rendezvous circuit:
Client -> (a1) -> (a2) -> (RP) <- (b3) <- (b2) <- (b1) <- HS
Client chooses 3 ORs: a1, a2, RP. HS chooses the other 3 ORs: b3, b2, b1.
Now consider HS as the malicious side. In order to get closer to client,
the best it can do is to connect directly to RP, which reduces the circuit
length to 3 hops:
Client -> (a1) -> (a2) -> (RP) <- HS
If client is the malicious side, the best it can do is to choose itself as
RP, which also reduces the circuit length to 3 hops:
Client <- (b3) <- (b2) <- (b1) <- HS
Taking any node from the original 6 hops circuit will result in 2 hops
circuit in either case. As noted above, 3 hops is assumed to be the
minimal circuit length.
3.3. Why HS5h only needs 5 hops
HS5h makes use of the fact that the last hop of a circuit is different from
all previous hops. Since last hop will be in direct contact with the
destination host, its identity does not need to be remain secret. We only
have to make sure the last hop cannot be determined by any malicious host
at will. Otherwise the malicious side can just pick himself as last hop and
thus shorten the circuit.
This is where hop negotiation come into play. A negotiated hop is
guaranteed to be a random node and cannot be determined by anyone. So it
can be used as the last hop for both client and HS. Since the last hop for
both sides overlaps, it's effectively acting as a rendezvous point. In
comparison, in HSv3, the rendezvous point is picked and controlled by the
client. Therefore it can not be used as the last hop for HSs' rendezvous
circuit.
Again, following is an ASCII graph for a HS5h rendezvous circuit:
Client -> (a1) -> (a2) -> (RP) <- (b2) <- (b1) <- HS
Now if the client is malicious, it can only connects directly to the RP,
i.e. pick the second last hop as himself. Therefore HS can still maintain a
3 hops distance from the malicious client:
Client -> (RP) <- (b2) <- (b1) <- HS
While in HSv3, the client can pick himself as the RP.
And because this is a symmetric scheme, the same argument goes for
malicious HS:
Client -> (a1) -> (a2) -> (RP) <- HS
Either cases, we maintained a 3 hops circuit.
3.4. Summary
We took 3 as the minimal circuit length because that's the default setup in
Tor. The analysis should work for any number.
To generalize it, if N is the minimal circuit length, then:
1) without node negotiation, the minimal rendezvous circuit length is 2N.
2) with node negotiation, the minimal rendezvous circuit length is 2N-1.
4. Other possible speed optimization under HS5h
4.1 Optimistic rendezvous mode
Similar to optimistic_data for AP connections, we can implement
optimistic_rendezvous mode under HS5h. The basic idea is to optimistically
assume any picked candidate RP will be willing to act as a RP.
Since now we need to maintain state in introduction point to help with node
negotiation, we can use it to communicate with HS when establishing the
rendezvous circuit. So both client and HS can launch circuits to newly
negotiated RP simultaneously. If this candidate RP refuses to be a
rendezvous point to one side, the other side can be notified via
introduction point. Then client and HS can go through the negotiation
process again to pick a new candidate RP.
This mode can also be implemented under HSv3 as well. In HSv3, wait for
RENDEZVOUS_ESTABLISHED cell and sending INTRODUCE1 cell has to be a
sequential operation. With optimistic_rendezvous mode, this can be done in
parallel.
NOTE: we need to test RENDEZVOUS_ESTABLISHED rate in real tor network. If
the rate is low, then this mode will slow down the setup process instead.
To make this mode work even better, we can have all OR explicitly mark
whether it is willing to be a rendezvous point in it's descriptor. This
should greatly increase the success rate for receiving
RENDEZVOUS_ESTABLISHED.
5. Hazards
a) How to design the scheme for mapping a random number to the same node
between client and server?
This will be trivia if it's easy to synchronize node list between two
onion proxy. The problem is: how much overhead will be caused by
synchronizing node list (or consensus).
b) How much overhead will be introduced by maintaining state for node
negotiation?
Since in HS5h, both introduction point and hidden service need to
maintain state for node negotiation.
To do so, the client can send something like a negotiation identifier
in the initial introduction request. Then for introduction point, it
can mark the circuit from the client with this ID. For HS, it can hash
the negotiation state structure into a hash table with ID as the key.
Theoretically, this overhead should be low, because the "negotiation
ID" is just like rendezvous cookie.