On Mon, Oct 30, 2017 at 03:57:04PM -0400, David Goulet wrote:
- DESTROY cells handling
ยท Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY cell is put in for one of the circuit on the cmux. An important thing for tor is that when it needs to send a DESTROY, it needs to _stop_ sending any queued cell on that circuit, dump them and only send the DESTROY cell.
Careful! I think this might be the opposite of what it needs to do.
If Tor wants to tear down a circuit, in normal circumstances it ought to finish flushing the currently queued cells first. If it discards the queued cells and only sends the destroy cell, then we end up with missing data.
Second, because of this concept of different queue for the DESTROY cell, tor will back and forth between that queue and the normal queue of a circuit. Remember, that the destroy queue is *not* per circuit but per circuitmux. This is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell the cmux handled was a DESTROY or 0 if not.
Yes, this part is definitely a mess.
I think we need to invent and design a better way to handle destroys -- getting the abstraction layer right between the "control" cells and the "data" cells is definitely a hard problem, especially when both kinds of cells end up being sent through the same mechanism.
- I believe now that we should seriously discuss the relevance of channels. Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
I am not opposed to ripping out the channel abstraction.
You make a good case that it was never completed, and now it will be harder to complete since it's been so long (and the person who designed and built it won't be completing it). Also, if we are hoping to modularize the Tor architecture to prepare it for component-by-component transitions to Rust or some other language, then simplifying first is a good move.
I guess the other option is to keep it limping along and make a plan to fix it and move to the right abstraction levels. That option would be best if we have a particular new channel in mind that we want to add -- such as the switch to udp transport that various research papers have proposed.
At least last I checked though, udp transport implies user-space tcp which is not a practical thing at our scale.
All of this said though, Nick is the Chief Architect, so I will defer to his judgment on which approach will get us a great fast stable Tor in the long term.
- In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
Getting rid of the Vanilla scheduler is fine with me. I imagine the Kist plan was to leave the Vanilla scheduler in place so there's something to fall back to in case of bugs or design surprises. It might be wisest to leave it in during 0.3.2, so Kist can stabilize, and plan to take it out during 0.3.3 or 0.3.4 if Kist continues looking good.
--Roger