Hello everyone!
DISCLAIMER: The following is enormous and tries to describe in some level of
details the situation in tor with connection<->channel<->scheduler. This comes
after we've merged the KIST scheduler, we've realized many things we'ren't what
they were suppose to be or meant for. In the end, I'm asking questions so we
can move forward with development and fixing things.
Last thing before you start your journey in the depth of Tor, the 3 subsystems
I'm going to talk about and how they interact are kind of very complicated so
it is very possible that I might have gotten things wrong or miss some details.
Please, point them out so we can better document, better be informed and make
good decisions. I plan to document as much as I can from this process for a new
file in torguts.git repository.
Many things are problematic currently
== Part Four - The Conclusion ==
Through this epic journey, we've discovered some issues as well as design
problems. Now the question is what should and can do about it?
In a nutshell, there are a couple of questions we should ask our selfves and
try to answer so we can move forward:
* I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the
relay to relay handshake and send/process cells related to the protocol. But,
as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not
properly implemented/finished (channel abstraction) and also something that
is unused. An abstraction implemented only for one thing is not really useful
except maybe to offer an example for others? But we aren't providing a good
example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to
turn it in a nicer interface, fixing all the issues I've described above (and
I suspect there might be more) so the cell scheduler can play nicely with
channels. Or, we could rip them off eliminating lots of code and reducing our
technical debt. I would like us to think about what we want seriously because
that channel subsystem is _complicated_ and very few of us fully understands
it afaict.
Which would bring us back to (which is btw basically what we have now
considering the channel queues are useless):
conn inbuf -> circ queue -> conn outbuf
If we don't want to get rid of channel, the fixes are non trivial. For
starter, we have to decide if we want to keep the channel queue or not and if
yes, we need to almost start from square 1 in terms of testing because we
would basically introduce a new layer of queuing cells.
* Dealing with the DESTROY cell design issue will require a bit more tricky
work I think. We must not starve circuit with a DESTROY cell pending to be
sent else the other side keeps sending data. But we should also not starve
all the circuits because if we ever need to send a gazillion DESTROY cell in
priority, we'll make the relay useless (DoS vector).
The question is, do we trust our EWMA policy to be wise enough to pick the
circuit in a reasonable amount of time so we can flush the DESTROY cell from
the circuit queue? Or we really need to bypass or prioritize somehow that
cell in order to send them asap in order to avoid load on the network because
the other side of the circuit is still sending?
* In the short term, we should get rid of Vanilla scheduler because it
complefixies a lot the scheduler code by adding uneeded things to channel_t
but also bloated the scheduler interface with pointless function pointers for
instance. And in my opinion, it is not helping performance the way it is done
right now.