Re: [tor-dev] [Draft Proposal] Scalable Hidden Services

30 Oct 2013

      On Mon, Oct 28, 2013 at 08:49:46PM +0000, George Kadianakis wrote:
...
Christopher Baines cbaines8@gmail.com writes:
...
On 28/10/13 13:19, Matthew Finkel wrote:
...
This is a proposal I wrote to implement scalable hidden services. It's
by no means finished (there are some slight inconsistencies which I will
be correcting later today or tomorrow) but I want to make it public in
the meantime. I'm also working on some additional security measures that
can be used, but those haven't been written yet.
Great, I will try to link this in to the earlier thread for some continuity.
It seems to me that this is a description of "Alternative 3" in Nick's
email. Multiple instances, with multiple sets of introduction points,
somehow combined in to one service descriptor? I haven't managed to
fully comprehend your proposal yet, but I though I would try and
continue the earlier discussion.
So, going back to the goals, this "alternative" can have master nodes,
but, can have you can also just have this "captain" role dynamically
self assigned. Why did you include an alternative here, do you see these
being used differently? It seems like the initial mode does not fulfil
goal 2 or 3?
One of the differences between the alternatives that keeps coming up, is
who (if anyone) can determine the number of nodes. Alternative 3 can
keep this secret to the service operator by publishing a combined
descriptor. I also discussed in the earlier thread how you could do this
in the "Alternative 4: Single hidden service descriptor, multiple
service instances per intro point." design, by having the instances
connect to each introduction point 1, or more times, and possibly only
connecting to a subset of the introduction points (possibly didn't
consider this in the earlier thread).
So far, we have avoided defining our adversaries. Here is an example
of some adversaries (wrt distinguishing multi-node HSes from
single-node HSes and finding the number of nodes and their status):

HS Client. This is a client who knows the descriptor of an
HS, and hence all its IPs.
Introduction Point. This is an introduction point of the HS. This is
a naive version of the next adversary and I will not consider it in
the attacks below.
Introduction Point + Client: This is an adversary who knows the
descriptor of an HS, hence all its IPs, and is also an IP of the
Hidden Service. This is superior to the simple 'Introduction Point'
adversary and more realistic (since not many Introduction Points
will target random HSes, but if you know an HS, you will try
targetting it by becoming its IP).

Let's see how these adversaries are doing if they want to determine
the number of HS-nodes:

Alternative 3: Single hidden service descriptor, one service instance per intro point:

-- From PoV of a client:
   If a client suspects that an HS is multi-node, then its number of
   nodes is simply the number of its introduction points.
-- Same thing applies for IP+Client.
Based on the proposal it could be even easier than that, but it is
definitely easy.
...

Alternative 4: Single hidden service descriptor, multiple service instances per intro point.

-- From PoV of an IP+Client:
   It's trivial for an IP+Client to distinguish a multi-node HS from a
   single-node HS, by looking at the number of introduction circuits
   to it. Single-node HSes only have a single IP circuit (IIRC).
Yup, but I think that with the current design, we will need to
decide how much information leakage we can accept. Is it acceptable for
a client or IP to know that we are scaling without knowing the number
of nodes or does the mere fact that we are scaling reveal enough
information to an attacker? I completely understand why we want to
limit the amount of information we reveal, but maybe we should also
categorize these properties into "We can not leak this information" and
"This would be a nice property to have in a design just because the
probability that an attack is successful is proportional to the amount
of information they have about the target."
...
Also, depending on how we assign HS-nodes to IPs it might be
   possible to find the number of HS-nodes too (or at least a lower or
   upper limit of them).
I don't see a way for a client to get the number of nodes of an HS in
Alternative 4. However, an IP+Client is able to do so in both
alternatives.
BTW, as Paul said, if we try to hide the number of nodes (from an
IP+Client adversary) by establishing multiple circuits from a single
HS-node to the IP, we should be careful because multiple "same source
same destination" circuits might lead to nasty attacks.
Finally, if we go with the "Alternative 4: Single hidden service
descriptor, multiple service instances per intro point." (which
currently seems as the best idea to me), we should think of how many
IPs each HS-node will connect to. There are at least three ways:
a) An HS-node establishes circuits to all the IPs.
b) An HS-node establishes circuits to a k-subset of the IPs.
c) An HS-node establishes circuits to a random number of the IPs.
From the above, a) trivially reveals the number of nodes to all IPs
and also establishes too many circuits which is bad for the network. I
think b) and c) our best options here. We should think of how various
values of 'k' change our security and availability here, and we should
think whether randomization actually adds any useful obfuscation wrt
the number/uptime of HS-nodes.
We should also think of how we assign HS-nodes to IPs. Lars Luthman
started doing so in
https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We
should think more!
We should also think about how many IPs are selected, because as far as
I can tell, each node is blind to how many peers it has. So, if the
initial hidden service chooses three IPs, and then 5 more nodes are
added, then each of those 5 nodes connect to k out of 3 IPs. How does a
node decide that it should add an IP to the set?
...
...
Another recurring point for comparison, is can anyone determine if a
particular service instance is down. Alternative 4 can get around this
by hiding the instances behind the introduction points, and to keep the
information from the introduction points, each instance (as described
above) can keep multiple connections open, occasionally dropping some to
keep the introduction point guessing. I think this would work, providing
that the introduction point cannot work out what connections correspond
with what instances. If each instance has a disjoint set of introduction
points, of which some subset (possibly total) is listed in the
descriptor, it would be possible to work out both if a instance goes
down, and what introduction points correspond to that instance, just by
repeatedly trying to connect through all the introduction points? If you
start failing to connect for a particular subset of the introduction
points, this could suggest a instance failure. Correlating this with
power or network outages could give away the location of that instance?
Indeed.
It also seems hard to me to obfuscate the number and status of
HS-nodes by randomly disconnecting introduction circuits from
IPs. Especially so if we want to do it without influencing the
performance of the HS (i.e. avoiding disconnecting circuits when
clients are using them). Naive solutions will probably allow IPs to
distinguish random decoy failure from an actual permanent
power-outage-like failure of an HS-node.
I'll ignore the random-disconnects idea for now and analyze our
alternatives with respect to recognizing the status (uptime) of
HS-nodes:

Alternative 4: Single hidden service descriptor, multiple service instances per intro point.

-- From the PoV of IP+Client:
   An IP+Client will be able to detect changes in the status of
   HS-nodes by monitoring its introduction circuits.

Alternative 3: Single hidden service descriptor, one service instance per intro point.

-- From the PoV of a client:
   Clients can distinguish uptime of HS peers, since they know that
   each peer has one IP (and they know all the IPs of a hidden
   service).
-- Same thing applies for IP+Client.
It seems to me that an IP+Client adversary is always able to find the
number and status of HS-nodes. The proposed ways to fix this is to add
measures like random-circuit-disconnects and connecting to IPs
multiple times from a single HS-node. Both of these solutions seems
easy to get wrong and hard to prove secure. We should think more about
them!
Perhaps we should look at hybrid solutions. None of them seem
worthwhile at this point, but they are worth considering.
- Shared Introduction Points, Multiple Descriptors
-- From POV of IP+Client:
   The client can retrieve all descriptors from the responsible HSDirs
   thus revealing all of the Introduction Points. The IP will receive
   connections from a subset of the hidden service nodes. This reveals
   the fact that it is a multi-node hidden service, but does not
   directly reveal how many nodes it contains.
Another, similar, option is for a HS to publish different descriptors
   for each period which include different IPs. Initially this will
   limit the likelihood of correlation and therefore prevent leaking the
   HS is multi-node and how many nodes. However, over time, this will be
   broken, as well, so this is only acceptable for short lived (possibly
   single period) hidden services.
Similarly, an HS publishes multiple descriptors to the same HSDir and
   the HSDir randomly chooses one of the descriptors everytime it is
   queried. However, this suffers the same fate.
- Shared Introduction Points, Combined descriptors
-- This provides the HSDir with too much power but for the sake of
   completeness we can see that this will improve the simplicity of
   setting up and configuring the hidden services but it is no more
   secure than the Shared Introduction Point design.
The more I think about this problem the more I convince myself we need
another layer of abstraction between the client and the IP, perhaps
proxy the INTRODUCE1 cell via the HSDir that is responible for the HS's
descriptor and the client never need know the IPs. This requires a lot
more thought (more than I can give it right now, at least).
Thanks for the analysis!

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] [Draft Proposal] Scalable Hidden Services