On 12 Nov (14:56:57), Roger Dingledine wrote:
On Mon, Oct 30, 2017 at 03:57:04PM -0400, David Goulet wrote:
- DESTROY cells handling
· Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY cell is put in for one of the circuit on the cmux. An important thing for tor is that when it needs to send a DESTROY, it needs to _stop_ sending any queued cell on that circuit, dump them and only send the DESTROY cell.
Careful! I think this might be the opposite of what it needs to do.
If Tor wants to tear down a circuit, in normal circumstances it ought to finish flushing the currently queued cells first. If it discards the queued cells and only sends the destroy cell, then we end up with missing data.
This is *exactly* that Tor does right now, it clears the circuit queue immediately when it is ready to send the DESTROY cell.
You (others) might want to double check that but hints (notice the clear cell queue):
circuit_about_to_free(): if (circ->n_chan) { circuit_clear_cell_queue(circ, circ->n_chan); /* Only send destroy if the channel isn't closing anyway */ if (!CHANNEL_CONDEMNED(circ->n_chan)) { channel_send_destroy(circ->n_circ_id, circ->n_chan, reason); }
connection_edge_process_relay_cell(): Look for channel_send_destroy(), you'll notice a clear queue before.
Still bad... ?
Second, because of this concept of different queue for the DESTROY cell, tor will back and forth between that queue and the normal queue of a circuit. Remember, that the destroy queue is *not* per circuit but per circuitmux. This is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell the cmux handled was a DESTROY or 0 if not.
Yes, this part is definitely a mess.
I think we need to invent and design a better way to handle destroys -- getting the abstraction layer right between the "control" cells and the "data" cells is definitely a hard problem, especially when both kinds of cells end up being sent through the same mechanism.
Right. It is also unclear to me how much it will affect tor if we simply put the DESTROY cell on the circuit queue and let the scheduler take care of it... I would say we could have some cases where it will take a bit more time than right now to send the DESTROY.
(Right now, it is sent right away bypassing the scheduler.)
- I believe now that we should seriously discuss the relevance of channels. Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
I am not opposed to ripping out the channel abstraction.
You make a good case that it was never completed, and now it will be harder to complete since it's been so long (and the person who designed and built it won't be completing it). Also, if we are hoping to modularize the Tor architecture to prepare it for component-by-component transitions to Rust or some other language, then simplifying first is a good move.
I guess the other option is to keep it limping along and make a plan to fix it and move to the right abstraction levels. That option would be best if we have a particular new channel in mind that we want to add -- such as the switch to udp transport that various research papers have proposed.
Right so UDP, see earlier post in the thread with Ian, has nothing to do with channels :). It has to do with TLS cells.
At least last I checked though, udp transport implies user-space tcp which is not a practical thing at our scale.
All of this said though, Nick is the Chief Architect, so I will defer to his judgment on which approach will get us a great fast stable Tor in the long term.
- In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
Getting rid of the Vanilla scheduler is fine with me. I imagine the Kist plan was to leave the Vanilla scheduler in place so there's something to fall back to in case of bugs or design surprises. It might be wisest to leave it in during 0.3.2, so Kist can stabilize, and plan to take it out during 0.3.3 or 0.3.4 if Kist continues looking good.
Agree.
Thanks! David
--Roger
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev