-------- Forwarded Message --------
Subject: Re: [tor-dev] Onion Service - Intropoint DoS Defenses
Date: Thu, 4 Jul 2019 20:38:48 +0200
From: juanjo <juanjo@avanix.es>
To: David Goulet <dgoulet@torproject.org>


These experiments and final note confirm what I thought about this rate limiting feature from the start: it is missing important parts. Ok, you can protect the network a little and the HS, but the general availability is not affected so it actually does not help for that.

I wanna make a proposal including many things at the same time, but I don't have much time to follow the guidelines to make a official proposal. Maybe in some weeks?

Again, I repeat: things that should be done now:

-Authenticated rend signature. This would help a lot I think.

-Mid-term: PoW for the client when reaching the 305prop limit instead of denying access? IDK, all always configurable.

-Deprecate clients or allow the Hidden Service to configure the IP to allow access for old version clients (not supporting new antiDoS features) or not. If we allow old version without protections, all security measures are useless.

And just a new idea: what about make the rotation of IP dynamic based on this prop305 values? + time based rotation:
One of the goal for rotation was defending against correlation attacks: if we set a lower limit we have a potential DoS (right now), if we set it high we have a potential correlation attack, bigger surface.
What about we join time based rotation (ex. 24 hours) + or limit reached based on the prop305 values.



On 3/7/19 20:37, David Goulet wrote:
On 30 May (09:49:26), David Goulet wrote:
Greetings!
[snip]

Hi everyone,

I'm writing here to update on where we are about the introduction rate
limiting at the intro point feature.

The branch of #15516 (https://trac.torproject.org/15516) is ready to be merged
upstream which implements a simple rate/burst combo for controlling the amount
of INTRODUCE2 cells that are relayed to the service.

As previously detailed in this thread, the default values are a rate of 25
introduction per second and a burst of 200 per second. These values can be
controlled by consensus parameters meaning they can be changed network wide.

We've first asked big service operators, I'm not going to detail the values
they provided us in private, but those defaults are quite large enough to
sustain heavy traffic from what we can tell from what they gave us.

The second thing we did is do experimental testing to see how CPU usage and
availability is affected. We've tested this with 3 _fast_ introduction points
and then 3 rate limited introduction points.

The good news is that once the attack stops, the rain of introduction requests
to the service stops very quickly.

With the default rate/burst values, on a Intel(R) Xeon(R) CPU E5-2650 v4 @
2.20GHz (8 cores), the tor service CPU doesn't go above ~60% (on one single
core). And almost drops to 0 as soon as the attack ends.

The bad news is that availability is _not_ improved. One of the big reasons
for that is because the rate limit defenses, once engaged at the intro point,
will send back a NACK to the client. A vanilla tor client will stop using that
introduction point away for 120 seconds if it gets 3 NACKs from it. This leads
to tor quickly giving up on trying to connect and thus telling the client that
connection is impossible to the .onion.

We've hacked a tor client to play along and stop ignoring the NACKs to see how
much time it would take to reach it. On average, a client would roughly need
around 70 seconds with more than 40 NACKs on average.

However, it varied a _lot_ during our experiments with many outliers from 8
seconds with 1 NACK up to 160 seconds with 88 NACKs. (For this, the
SocksTimeout had to be bumped quite a bit).

There is an avenue of improvement here to make the intro point sends a
specific NACK reason (like "Under heavy load" or ...) which would make the
client consider it like "I should retry soon-ish" and thus making the client
possibly able to connect after many seconds (or until the SocksTimeout).

Another bad news there! We can't do that anytime soon because of this bug that
basically crash clients if an unknown status code is sent back (that is a new
NACK value): https://trac.torproject.org/30454. So yeah... quite unfortunate
there but also a superb reason for everyone out there to upgrade :).

One good news is that it seems that having fast intro points instead of slow
IPs doesn't change much on the overall load on the service so this for now,
our experiment, shows it doesn't matter.

Overall, this rate limit feature does two things:

1. Reduce the overall network load.

   Soaking the introduction requests at the intro point helps avoid the
   service creating pointless rendezvous circuits which makes it "less" of an
   amplification attack.

2. Keep the service usable.

   The tor daemon doesn't go in massive CPU load and thus can be actually used
   properly during the attack.

The problem with (2) is the availability part where for a legit client to
reach the service, it is close to impossible for a vanilla tor without lots of
luck.  However, if let say the tor daemon would be configured with 2 .onion
where one is public and the other one is private with client authorization,
then the second .onion would be totally usable due to the tor daemon not being
CPU overloaded.

As a third thing we did about this. In order to make this feature a bit more
"malleable", we are working on https://trac.torproject.org/30924 which is
proposal 305.

In short, torrc options are added so an operator can change the rate/burst
that the intro points will use. We can do that using the ESTABLISH_INTRO cell
that will have an extension to define the DoS defensavailability
e parameters (proposal
305).

That way, a service operator can disable this feature, or turn the knobs on
the rate/burst in order to basically adjust the defenses.

At this point in time, we don't have a good grasp on what happens in terms of
CPU if the rate or the burst is bumped up or even how availability is
affected. During our experimentation, we did observed a "sort of" linear
progression between CPU usage and rate. But we barely touched the surface
since it was changed from 25 to 50 to 75 and that is it.

We would require much more experimentation which is something we want to avoid
as much as possible on the real network.

Finally, many more changes are cooking up. One in particular is
https://trac.torproject.org/projects/tor/ticket/26294 that will make tor to
only rotate its intro points when the number of introduction requests is
between 150k to 300k (random value) which currently is between 16k and 32k.
See the ticket for the benefits here which mostly helps with (1).

There has been much talk about a client PoW (see the proposal 305 thread on
this list) which in theory would help out with service availability.

We will also soon merge upstream this ticket https://trac.torproject.org/24962
which goes one step further at denying single-hop connections to the
HSDir/Intro in order to try as much as possible to shutdown the Tor2web
connections (or any attacker that speeds things up on their side by single
hoping).

We are making progress here... This is really a non trivial problem and
solution for service availability are not that simple. Our priority is to
protect the network as much as possible and then move to possible solutions
for availability.

I'll stop for now. Huge thanks to everyone who provided service logs, ideas,
code review and future testers :).

Cheers!
David


_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev