On 29 Jan 2018, at 08:00, Florentin Rochet <florentin.rochet@uclouvain.be> wrote:

Hello,

On 28/01/18 11:52, teor wrote:
Hi,

I have some more questions:

Nice, thanks! I still have to answer your previous email and push an
update to the proposal. I should do it this week, sorry for late answers :)

See inline a few answers to your questions:


On 18 Jan 2018, at 11:03, teor <teor2345@gmail.com> wrote:

Unanswered questions:
The Tor network has been experiencing excessive load on guards and
middles since December 2017.

If I have correctly followed what was happening: around 1M tor2web
clients appeared at OVH

Not just OVH, at least 3 different providers.

And not just Tor2web, either.
There are onion services which are overloading the network
as well, probably in response to these clients. The onion services
are mostly overloading guard-weighted nodes.

and started to overload the network with circuit
creation requests using the old and costly TAP handshake.

Not just TAP. The sheer number of entry connections, extend requests,
and destroy cells is also creating overloads on some relays.

Tor2web
clients make direct connections to the intro point and to the rendezvous
point, right?

Yes.

And, looking into the code right now, it does not looks
like Tor2webs make any distinction to flags. So, basically, the Tor2web
load is only weighted by consensus weight (bandwidth-weights have no
impact) on the overall network (exits too).

This only applies if Tor2webRendezvousPoints is set.
Otherwise, the nodes are middle-weighted.

Guess: shouldn't that the reason why all exits logs are flooded with the
message "[warn] Tried to establish rendezvous on non-OR circuit with
purpose Acting as rendevous (pending)"? Those messages would be caused
by tor2web clients picking exit relays as rendezvous node :/ I started
to see them increasing more and more since August 2017.

No, this is a different issue.
Exit relays are allowed as rendezvous nodes.

So basically, I *think* we can drop the questions below because
bandwidth-weights do not play any role in the excessive load that the
network is handling with those tor2webs.

Guard weights are used by overloading onion services, and middle
weights are used by overloading Tor2web clients.

Does the waterfilling proposal make excessive load on guards worse, by
allocating more guard weight to lower capacity relays?
Is the extra security worth the increased risk of failure?

We want to design a network that can handle different kinds of extra load.
So these questions are important, even if they don't apply right now.

Does the waterfilling proposal make excessive load on middles better, by
allocating more middle weight to higher capacity relays?
Is there a cascading failure mode, where excess middle weight overwhelms
our top relays one by one? (It seems unlikely.)

I'm going to re-ask this questions, in light of the extra middle load from
Tor2web clients:

Does the waterfilling proposal make excessive load on middles worse, by
allocating more middle weight to higher capacity relays?

In particular, connections are limited by file descriptors, and file descriptor
limits typically don't scale with the bandwidth of the relay. As far as I can tell,
waterfilling would have directed additional Tor2web traffic to large guards.
It would have brought down my guards faster, and made it much harder for me
to keep them up.

If we had implemented waterfilling before this attack, would it have lead to
cascading failures on our top guards? They would have been carrying
significantly more middle load, and mine barely managed to cope.

Can you redesign the proposal so there is some limit on the extra middle load
assigned to a guard? Or does this ruin the security properties?

Is there a compelling argument for security over network robustness?

I also have another practical question:

We struggle to have time to maintain the current bandwidth authority
system.

Is it a good idea to make it more complicated?

Hm, I don't see how Waterfilling plays any role with torflow or
bwscanner? I mean, there is still this feedback loop thing but it has no
impact on the design of the current torflow or bwscanner?

I can't really say.
I look forward to your explanation of the feedback loop. 

Could you be
more specific about your concerns with the bandwidth authorities and
this proposal?

It takes time and effort from Tor people to integrate and maintain the
code and monitoring for a new proposal like this one.

We will need to take extra time on this proposal, because we already need
more monitoring for the current bandwidth authority system. And only then
would we have time to build monitoring specific to this proposal.

Also, when we change bandwidth measurement or allocation, we need to
change one thing at a time, and then monitor the change. So depending
on our priorities, this proposal may need to wait until after we implement
and monitor other urgent fixes.

Who will maintain the new code we add to Tor to implement waterfilling?

I would volunteer to that.

Typically, experienced Core Tor team members review and maintain code.

And there's still a lot of development and testing work to be done before
the code is ready to merge. Are you able to do this development?

How much help will you need to write a new consensus method?
How much help will you need to write unit tests?
(This help will come from existing team members.)

Does your current code pass:
* make check
* make test-network-all
  * in particular, any new consensus method must pass the "mixed" network,
    with an unpatched Tor version in your path as "tor-stable"

Who will build the analysis tools to show that waterfilling benefits the
network?

Volunteers or master students. I can definitely suggest this topic in my
university.

Typically, experienced Tor Metrics team members write, review, and maintain
monitoring systems. And they don't have a lot of extra capacity right now.

Even if students do this task, they would need help from existing team
members.

Do the benefits of waterfilling justify this extra effort?

Question for the other Tor devs :) I am definitely biased towards the "yes"

It seems plausible, but I don't feel I have seen a compelling enough argument
to prioritise it above fixing bandwidth authorities.

At the moment, reasonably fast guards in Eastern North America and Western
Europe are overloaded with client traffic. And guards in the rest of the world
are under-loaded. Reducing this bias is something we need to do.

And this proposal gets us better security if we fix this geographical bias first.
Otherwise, adversaries can simply pick a location that massively increases
their consensus weight, and get lots of client traffic.

And even if they do, should we focus on getting the bandwidth authorities
in a maintainable state, before adding new features?
(I just gave similar advice to another developer who has some great ideas
about improving bandwidth measurement.)


Bandwidth-weights and measurements (consensus weights) are two different
things that solve 2 different problems. So, we can work independently on
improving measurements (like what is currently done with bwscanner) and
improving Tor's balancing (bandwidth-weights) with this proposal.

I don't think this is realistic.
There is always contention for shared resources.

Integrating and testing new code, and monitoring its effects, will take effort from
the teams I mentioned above. This takes away from the urgent work of fixing the
bandwidth authority system. Which also takes effort from the Core Tor and
Metrics teams.

What about the feedback loop between this new allocation system
and the bandwidth authorities?

I am sorry, I don't really understand why a feedback loop is needed. Measuring bandwidth and producing bandwidth-weights seems orthogonal to me.
You do not need to add a feedback loop, one already exists:
1. Consensus weights on guards and middles change
2. Client use of guards and middles change
3. Bandwidth authority measurements of guards and middles change
4. Repeat from 1

My question is:

How does this existing feedback loop affect your proposal?
Does it increase or reduce the size of the guard and middle weight changes?
I have added those questions to the proposal. This looks difficult to know.
Can shadow simulate this?
I am still interested in this feedback loop.
If it fails to converge, the system will break down.

Yup. Going to answer this on your previous email.

T