Hello everyone!
DISCLAIMER: The following is enormous and tries to describe in some level of details the situation in tor with connection<->channel<->scheduler. This comes after we've merged the KIST scheduler, we've realized many things we'ren't what they were suppose to be or meant for. In the end, I'm asking questions so we can move forward with development and fixing things.
Last thing before you start your journey in the depth of Tor, the 3 subsystems I'm going to talk about and how they interact are kind of very complicated so it is very possible that I might have gotten things wrong or miss some details. Please, point them out so we can better document, better be informed and make good decisions. I plan to document as much as I can from this process for a new file in torguts.git repository.
This is part of the work from ticket #23993.
Enjoy!
== Part One - The Distant Past ==
Once upon a time, Tor had 0.2.3 years old. It was before the "big change" and it was using connections and circuits in a simplistic but yet clever way. I'll briefly describe here how it worked out and why we wanted to change it.
Roughly, once a cell comes in, it is put in the inbound buffer called "inbuf" of a connection_t and it is processed. Then, if it needs to be relayed that is sent along the circuit it came on, it is put in that circuit queue. Each connection_t had a priority queue of "active circuits" using the EWMA policy to prioritize. The first circuit was to be flushed of a certain amount of cells onto the connection_t outbound buffer or "outbuf" where libevent main loop takes over to actually write the bytes on the network from the outbuf. Then, the circuit pqueue is updated and we would move on. The following is a high level diagram of a relayed cell:
+--------------+ +-----------+ +----------------+ |connection_t A| | circuit_t | | connection_t B| | (inbuf) |--->| (pqueue) |--->| (outbuf) |---> Network +--------------+ +-----------+ +----------------+
That had at least one big problem. Not considering the entire set of circuits could make one connection (with some circuits on it) bloat the network link of the relay. It would actually be worst than that if it had bandwidth limitation, you want to pick the connection that has the highest priority for the circuit on it to flush
All in all, tor needed to move to a logic where _all_ circuits are considered using the EWMA policy, flushing the cell(s) of that picked circuit on the connection outbuf so it is prioritized on the network.
Then in 0.2.4, everything changed!
== Part Two - The Big Change ==
Two new abstractions were added to Tor 0.2.4 (#6465), channel and circuitmux (short for circuit multiplexer or multiplexing, history is unclear ;). Up until this day, this is what tor is using.
The channel_t object was intended to provide a "transport" abstraction layer for relay to relay connection which is reprensented by a connection_t object. Each channel has a circuitmux_t object that encapsulates the queuing logic and selection policy of circuits of a channel. In other words, the circuits that goes through that connection are listed in that object so we can do cell prioritization.
To summarize, this roughly looks like this now:
+------------+ |connection_t| +------------+ | v 1 +---------+ |channel_t| +---------+ | v 0..n +---------+ |circuit_t| +---------+
Channels are suppose to work as a protocol abstraction on that connection. The original intent of this was to be able to implement more easily different protocl like for instance helping researchers to experiment with other things more easily. Decoupling TLS from connection_t and putting them in a subclass of channel_t and thus was born channel_tls_t. It implements the channel_t interface and is in theory designed to handle TLS layer of relay to relay connection (handshake, cells, etc). So far, tor only has that channel class.
Now that we had this layer and a way to multiplex circuits on a single connection through the use of channel, we needed one last piece, a way to prioritize cells on the network considering all circuits. Along came the "cell scheduler" (scheduler.c) in Tor 0.2.6 and now called the Vanilla scheduler.
In a nutshell, libevent main loop calls the scheduler at a constant rate and the subsystem will consider all channels between each other comparing circuits priority using the EWMA policy (an interface was created to implement more policies but I won't get into that in this email).
So now, tor finally had a good infrastructure to prioritize cells on the network by looking at all circuits at once. TCP connections would be established between relays, the channel layer would be handling the TLS business and the scheduler would be flushing cells on the network by considering all circuits on all channels. It is a really nice modularized design and would make tor improve in performance.
But, that's not entirely the reality of things right now.
== Part Three - The Thruth ==
In 2014, a paper came out titled: Never Been KIST: Tor’s Congestion Management Blossoms with Kernel-Informed Socket Transport (http://www.robgjansen.com/publications/kist-sec2014.pdf). An improved way for tor to scheduled data transmission properly by asking the kernel information about the state of the TCP buffers so tor could flush data on the network without bloating the local link.
Then in 2016 up to now, Matthew Traudt implemented KIST in tor which required an important refactoring of the scheduler subsystem in order to make it work with different multiple scheduling type. It is today the Scheduler option in torrc. And all 0.3.2 tor uses KIST by default. Many things were discovered during that piece of work by Matt. For instance, this very important #20459 bug for which we were badly comparing cmux policy thus ultimately prioritizing wrongly our circuits.
But fundemental issues in the design of channels and the scheduler were found for which I'll list some here and will ultimately lead me to Part Four of this email about our next steps moving forward.
1. Channels implemented cell queues that weren't really queues.
If you look at the channel_t object, you'll notice "incoming_queue" and "outgoing_queue" which basically allow the transition of a cell from a connection_t inbuf to the circuit queue and then to the connection_t oubuf. It looks like this for a relayed cell:
conn inbuf -> chan in_queue -> circ queue -> chan out_queue -> conn outbuf
The idea behind this is to quickly empty the conncetion inbuf to an intermediary object (channel) leaving the kernel inbound TCP queue as much room as we can. In other words, offloading the kernel buffer to user space since we need to run some processing on that data and there was no need to leave that data on the kernel buffer during that time.
Then from the channel, a cell would be put on the circuit queue and in turn would be process by the scheduler. The scheduler would select a circuit (EWMA policy), flush cells from its queue onto the channel outgoing queue and then it would be flushed to the connection outbuf which is taken care of by libevent to write on the socket (to the network).
One could argue here that in terms of design we have too many queues but that would be true if the channel queues were actually acting like queues. It turns out that the incoming and outgoing queues of a channel are always empty (#23709) and this is due to the fact that the code queues a cell then processes it immediately meaning either put it on the circuit queue or connection outbuf.
Each cell is memcpy() over the a channel queue and then processed right away just to be copied again into the circuit queue or outbuf. This has a performance cost over time as a relay sees thousands of cells a second. But also it makes the channel subsystem much more complicated in terms of code with trying to handle those queues every part of the way in and out of the channel subsystem. But the worst part is that the scheduler subsystem assumed some decisions based on the outgoing queue size. And when it is always 0, issue arise like #23676.
2. DESTROY cells handling
Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY cell is put in for one of the circuit on the cmux. An important thing for tor is that when it needs to send a DESTROY, it needs to _stop_ sending any queued cell on that circuit, dump them and only send the DESTROY cell.
Many things are problematic currently. For starter, sending a DESTROY cell bypasses the scheduler (#23712). Tor will immediately try to flush the cell on the network *if* the channel doesn't have queued *writes* which means if the connection outbuf is empty. If not, once it is sufficiently drained, the channel will be scheduled in and the DESTROY cell finally sent. I've observed this process going from 0.5 seconds up to 10 seconds on a normal relay if the outbuf has data on it.
Second, because of this concept of different queue for the DESTROY cell, tor will back and forth between that queue and the normal queue of a circuit. Remember, that the destroy queue is *not* per circuit but per circuitmux. This is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell the cmux handled was a DESTROY or 0 if not.
This has a side effect. Before sending a DESTROY cell, the circuit queue is cleared out without changing the circuit EWMA priority, in order to make sure we don't send cells from that point on. However, because we back and forth, the EWMA policy will pick the right channel based on the overall state of the circuits (see scheduler_compare_channels()) on it but will not prioritize the circuit with the DESTROY cell on using that policy. Furthermore, the destroy cell queue is a FIFO so if 10 DESTROY cells were queued bu th 10th one is actually on the circuit with the highest priority, we won't process it until those 9 previous who came in before are flushed.
3. Connection and Channel decoupling is kind of a lie
The connection_t object is bound to be a TCP connection except for DNSPort which will be a UDP. That object also has a "TLS connection state" (tor_tls_t) and a channel_tls_t object (chan).
First, it turns out that the connection subsystem will establish the TLS session and not the channel tls layer. When opening a circuit, it looks like this: channel_connect_for_circuit() |- channel_connect() |- channel_tls_connect() |- connection_or_connect() |- connection_tls_start_handshake()
Second, when processing a cell from the connection inbuf in connection_or_process_cells_from_inbuf(), we'll directly call channel_tls_handle_cell(). The channel_tls_t layer handles PADDING, VERSIONS, CERTS, AUTH_CHALLENGE, AUTHENTICATE and NETINFO cells. The other cell types will be queued on the corresponding circuit queue.
Third, once the TLS session has been established, a VERSIONS cell can be sent which is done by the connection subsystem with connection_or_send_versions(). But it is bypassing the channel_tls_t layer that processes them and should write them using channel_tls_write_cell_method(). Futhermore, the channel_tls_t layer is using the connection layer to send those cells using connection_or_send_certs_cell() and cie.
My point here is that the channel abstraction is kind of violated in many ways by either directly calling the TLS channel layer or not using it to send specific cells that are for the TLS handshake. And also, the TLS state is baked in the connection subsystem. This means that anyone wanting to implement a new channel will need to do a major refactoring first which unfortunately leaves this abstraction kind of pointless. And do not make the mistake to label the channel subsystem as a "transport layer", it is at best a protocol layer, connection_t handles transport, not channels.
4. Vanilla scheduler is not really a scheduler.
It turns out that the Vanilla scheduler is not really scheduling things but rather shoving as much as it could onto the connection outbuf. This ain't ideal because tor is selecting circuit based on a policy (EWMA) and when the circuit is selected, we would push on the connection outbuf by either writing as much as we can on the outbuf or stop at 1000 cells (see MAX_FLUSH_CELLS) which is 512 * 1000 bytes == 0.5MB. One can argue, is it too little to only write 0.5MB per scheduler tick or actually too much?
So imagine tor flushed 1000 cells everytime it picks a circuit and then moves on to the next circuit. I'm not sure this is playing well with a relay with bandwidth limitation for instance. If you have 100 circuits, and 100KB to spend, you might not want to spend it all on one single circuit?
Also, because this scheduler runs as often as possible, only one channel was considered everytime. Matt's experiment showed that in practice which means that a scheduler that is only considering one channel doesn't make good priority decision because there is none in the first place.
Finally, when trying to bloat the outbuf as much as possible, and for that I'm not sure how libevent operates, but it leaves the decision of when and what to flush on the network to the process that handles outbuf. In other words, the scheduler doesn't actually control when the data is written to the socket so the scheduling decision aren't necessarly followed through on the transport layer. This is something the scheduler and connection subsystem should be thightly connected.
== Part Four - The Conclusion ==
Through this epic journey, we've discovered some issues as well as design problems. Now the question is what should and can do about it?
In a nutshell, there are a couple of questions we should ask our selfves and try to answer so we can move forward:
* I believe now that we should seriously discuss the relevance of channels. Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
Which would bring us back to (which is btw basically what we have now considering the channel queues are useless):
conn inbuf -> circ queue -> conn outbuf
If we don't want to get rid of channel, the fixes are non trivial. For starter, we have to decide if we want to keep the channel queue or not and if yes, we need to almost start from square 1 in terms of testing because we would basically introduce a new layer of queuing cells.
* Dealing with the DESTROY cell design issue will require a bit more tricky work I think. We must not starve circuit with a DESTROY cell pending to be sent else the other side keeps sending data. But we should also not starve all the circuits because if we ever need to send a gazillion DESTROY cell in priority, we'll make the relay useless (DoS vector).
The question is, do we trust our EWMA policy to be wise enough to pick the circuit in a reasonable amount of time so we can flush the DESTROY cell from the circuit queue? Or we really need to bypass or prioritize somehow that cell in order to send them asap in order to avoid load on the network because the other side of the circuit is still sending?
* In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
Cheers! David
On 31 Oct 2017, at 06:57, David Goulet dgoulet@ev0ke.net wrote:
- I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?
Even if we are serious about non-TLS connections, do we want to rip out this code now, and write something better when we know what we actually need?
Is the channel layer the right place to hide the differences between TLS-over-IPv4 and TLS-over-IPv6? (I don't think it is, but it's worth thinking about how much work it was to add IPv6 support, and using that as a guide for how much work it would be to add address/port/keys/etc. for another protocol.)
T
On Wed, Nov 01, 2017 at 02:28:03PM +1100, teor wrote:
On 31 Oct 2017, at 06:57, David Goulet dgoulet@ev0ke.net wrote:
- I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?
The channel layer has certainly been used fruitfully in the past for experiments with other transports, such as UDP-based ones, QUIC-Tor, etc. I would be a little sad to see it disappear completely.
On 01 Nov (07:31:50), Ian Goldberg wrote:
On Wed, Nov 01, 2017 at 02:28:03PM +1100, teor wrote:
On 31 Oct 2017, at 06:57, David Goulet dgoulet@ev0ke.net wrote:
- I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?
The channel layer has certainly been used fruitfully in the past for experiments with other transports, such as UDP-based ones, QUIC-Tor, etc. I would be a little sad to see it disappear completely.
So after Montreal meeting, I got access to QUIC-Tor code. And, as a misconception of channels, they aren't about "transport" but "protocol".
Thus the QUIC-Tor code didn't even *touch* channels ;). Everything they did had to be done mostly at the connection layer.
For some reasearch to experiement with channels, it would be basically a research based on _removing_ TLS between relays. I'm not aware of such a thing right now but I'm sure someone did poked at it for sure!
Cheers! David
-- Ian Goldberg Professor and University Research Chair Cheriton School of Computer Science University of Waterloo _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Wed, Nov 01, 2017 at 09:09:36AM -0400, David Goulet wrote:
On 01 Nov (07:31:50), Ian Goldberg wrote:
On Wed, Nov 01, 2017 at 02:28:03PM +1100, teor wrote:
On 31 Oct 2017, at 06:57, David Goulet dgoulet@ev0ke.net wrote:
- I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?
The channel layer has certainly been used fruitfully in the past for experiments with other transports, such as UDP-based ones, QUIC-Tor, etc. I would be a little sad to see it disappear completely.
So after Montreal meeting, I got access to QUIC-Tor code. And, as a misconception of channels, they aren't about "transport" but "protocol".
Thus the QUIC-Tor code didn't even *touch* channels ;). Everything they did had to be done mostly at the connection layer.
For some reasearch to experiement with channels, it would be basically a research based on _removing_ TLS between relays. I'm not aware of such a thing right now but I'm sure someone did poked at it for sure!
Ah, bad example I guess. But some of my work in the past certainly has used channels for this purpose.
On 01 Nov (14:28:03), teor wrote:
On 31 Oct 2017, at 06:57, David Goulet dgoulet@ev0ke.net wrote:
- I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?
Even if we are serious about non-TLS connections, do we want to rip out this code now, and write something better when we know what we actually need?
Right, that is basically a very good question to try to answer. I think it is totally conceivable to think about researchers willing to experiement there and go with a Tor without TLS. At that point, channel could be useful but in a state that actually is a proper abstraction (not the case right now). We would need work to make it happen.
However, I do *NOT* see Tor moving away from TLS anytime soon. The network and clients out there are all using that, moving to something else would mean dual stack protocol, years of ramping up to network maturity (basically ntor vs tap is a good example I think). But at that point, we'll have to do massive amount of work in the channel/connection subsystem.
All in all, my answer is that "No, we aren't serious about moving away from TLS but always possible".
Thus, in the end, this channel subsystem is really about letting researchers play with it and not helping us developers do our job. There could be a gain there of fixing it but would be one sided for now imo.
Is the channel layer the right place to hide the differences between TLS-over-IPv4 and TLS-over-IPv6?
That would be the connection layer, handling 1 to 1 socket connection and talking to the kernel.
SCTP-Tor for instance, would also be at the connection layer. That subsystem in my opinion would benefit *greatly* for a nice interface but that is huge amount of work.
Thus, I want to re-iterate that if we care about providing a nice abstraction for researchers to do a better job, we have a broken one at the channel layer and none at the connection layer which is actually the one that would be most useful (QUIC, SCTP, UDP, ...). So even today we do *not* offer anything useful to researchers imo.
Cheers! David
(I don't think it is, but it's worth thinking about how much work it was to add IPv6 support, and using that as a guide for how much work it would be to add address/port/keys/etc. for another protocol.)
T
-- Tim / teor
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Mon, Oct 30, 2017 at 03:57:04PM -0400, David Goulet wrote:
- DESTROY cells handling
· Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY cell is put in for one of the circuit on the cmux. An important thing for tor is that when it needs to send a DESTROY, it needs to _stop_ sending any queued cell on that circuit, dump them and only send the DESTROY cell.
Careful! I think this might be the opposite of what it needs to do.
If Tor wants to tear down a circuit, in normal circumstances it ought to finish flushing the currently queued cells first. If it discards the queued cells and only sends the destroy cell, then we end up with missing data.
Second, because of this concept of different queue for the DESTROY cell, tor will back and forth between that queue and the normal queue of a circuit. Remember, that the destroy queue is *not* per circuit but per circuitmux. This is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell the cmux handled was a DESTROY or 0 if not.
Yes, this part is definitely a mess.
I think we need to invent and design a better way to handle destroys -- getting the abstraction layer right between the "control" cells and the "data" cells is definitely a hard problem, especially when both kinds of cells end up being sent through the same mechanism.
- I believe now that we should seriously discuss the relevance of channels. Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
I am not opposed to ripping out the channel abstraction.
You make a good case that it was never completed, and now it will be harder to complete since it's been so long (and the person who designed and built it won't be completing it). Also, if we are hoping to modularize the Tor architecture to prepare it for component-by-component transitions to Rust or some other language, then simplifying first is a good move.
I guess the other option is to keep it limping along and make a plan to fix it and move to the right abstraction levels. That option would be best if we have a particular new channel in mind that we want to add -- such as the switch to udp transport that various research papers have proposed.
At least last I checked though, udp transport implies user-space tcp which is not a practical thing at our scale.
All of this said though, Nick is the Chief Architect, so I will defer to his judgment on which approach will get us a great fast stable Tor in the long term.
- In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
Getting rid of the Vanilla scheduler is fine with me. I imagine the Kist plan was to leave the Vanilla scheduler in place so there's something to fall back to in case of bugs or design surprises. It might be wisest to leave it in during 0.3.2, so Kist can stabilize, and plan to take it out during 0.3.3 or 0.3.4 if Kist continues looking good.
--Roger
On 13 Nov 2017, at 06:56, Roger Dingledine arma@mit.edu wrote:
On Mon, Oct 30, 2017 at 03:57:04PM -0400, David Goulet wrote: 2. DESTROY cells handling · Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY cell is put in for one of the circuit on the cmux. An important thing for tor is that when it needs to send a DESTROY, it needs to _stop_ sending any queued cell on that circuit, dump them and only send the DESTROY cell.
Careful! I think this might be the opposite of what it needs to do.
If Tor wants to tear down a circuit, in normal circumstances it ought to finish flushing the currently queued cells first. If it discards the queued cells and only sends the destroy cell, then we end up with missing data.
Sending a DESTROY cell after dropping data still tears down a circuit, but (depending on the sender's position in the circuit) it tears it down with a digest error. Which is probably not what we want.
That said, there may be no way to tell if the application-level data is complete or not, so an error teardown may be appropriate.
T
On 12 Nov (14:56:57), Roger Dingledine wrote:
On Mon, Oct 30, 2017 at 03:57:04PM -0400, David Goulet wrote:
- DESTROY cells handling
· Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY cell is put in for one of the circuit on the cmux. An important thing for tor is that when it needs to send a DESTROY, it needs to _stop_ sending any queued cell on that circuit, dump them and only send the DESTROY cell.
Careful! I think this might be the opposite of what it needs to do.
If Tor wants to tear down a circuit, in normal circumstances it ought to finish flushing the currently queued cells first. If it discards the queued cells and only sends the destroy cell, then we end up with missing data.
This is *exactly* that Tor does right now, it clears the circuit queue immediately when it is ready to send the DESTROY cell.
You (others) might want to double check that but hints (notice the clear cell queue):
circuit_about_to_free(): if (circ->n_chan) { circuit_clear_cell_queue(circ, circ->n_chan); /* Only send destroy if the channel isn't closing anyway */ if (!CHANNEL_CONDEMNED(circ->n_chan)) { channel_send_destroy(circ->n_circ_id, circ->n_chan, reason); }
connection_edge_process_relay_cell(): Look for channel_send_destroy(), you'll notice a clear queue before.
Still bad... ?
Second, because of this concept of different queue for the DESTROY cell, tor will back and forth between that queue and the normal queue of a circuit. Remember, that the destroy queue is *not* per circuit but per circuitmux. This is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell the cmux handled was a DESTROY or 0 if not.
Yes, this part is definitely a mess.
I think we need to invent and design a better way to handle destroys -- getting the abstraction layer right between the "control" cells and the "data" cells is definitely a hard problem, especially when both kinds of cells end up being sent through the same mechanism.
Right. It is also unclear to me how much it will affect tor if we simply put the DESTROY cell on the circuit queue and let the scheduler take care of it... I would say we could have some cases where it will take a bit more time than right now to send the DESTROY.
(Right now, it is sent right away bypassing the scheduler.)
- I believe now that we should seriously discuss the relevance of channels. Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
I am not opposed to ripping out the channel abstraction.
You make a good case that it was never completed, and now it will be harder to complete since it's been so long (and the person who designed and built it won't be completing it). Also, if we are hoping to modularize the Tor architecture to prepare it for component-by-component transitions to Rust or some other language, then simplifying first is a good move.
I guess the other option is to keep it limping along and make a plan to fix it and move to the right abstraction levels. That option would be best if we have a particular new channel in mind that we want to add -- such as the switch to udp transport that various research papers have proposed.
Right so UDP, see earlier post in the thread with Ian, has nothing to do with channels :). It has to do with TLS cells.
At least last I checked though, udp transport implies user-space tcp which is not a practical thing at our scale.
All of this said though, Nick is the Chief Architect, so I will defer to his judgment on which approach will get us a great fast stable Tor in the long term.
- In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
Getting rid of the Vanilla scheduler is fine with me. I imagine the Kist plan was to leave the Vanilla scheduler in place so there's something to fall back to in case of bugs or design surprises. It might be wisest to leave it in during 0.3.2, so Kist can stabilize, and plan to take it out during 0.3.3 or 0.3.4 if Kist continues looking good.
Agree.
Thanks! David
--Roger
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Mon, Oct 30, 2017 at 3:57 PM, David Goulet dgoulet@ev0ke.net wrote:
Hello everyone!
DISCLAIMER: The following is enormous and tries to describe in some level of details the situation in tor with connection<->channel<->scheduler. This comes after we've merged the KIST scheduler, we've realized many things we'ren't what they were suppose to be or meant for. In the end, I'm asking questions so we can move forward with development and fixing things.
Last thing before you start your journey in the depth of Tor, the 3 subsystems I'm going to talk about and how they interact are kind of very complicated so it is very possible that I might have gotten things wrong or miss some details. Please, point them out so we can better document, better be informed and make good decisions. I plan to document as much as I can from this process for a new file in torguts.git repository.
Snipping the analysis, and going straight to the conclusions. I'll leave one sentence in the analysis because it's such a great summary:
Many things are problematic currently
They sure are. :)
== Part Four - The Conclusion ==
Through this epic journey, we've discovered some issues as well as design problems. Now the question is what should and can do about it?
In a nutshell, there are a couple of questions we should ask our selfves and try to answer so we can move forward:
- I believe now that we should seriously discuss the relevance of channels. Originally, the idea was good that is providing an abstraction layer for
the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
Which would bring us back to (which is btw basically what we have now considering the channel queues are useless):
conn inbuf -> circ queue -> conn outbuf
If we don't want to get rid of channel, the fixes are non trivial. For starter, we have to decide if we want to keep the channel queue or not and if yes, we need to almost start from square 1 in terms of testing because we would basically introduce a new layer of queuing cells.
So, this is the question I'm least sure about. Please take the following as tentative.
I think that the two choices ("refactor channels" and "rip out channels") may be less different than we think. Neither one is going to be trivial to do, and we shouldn't assume that sticking everything together into one big type will actually make the code _simpler_.
The way I think about the code right now, "channel" is an interface which "connection_or" implements, and there is no meaningful barrier between connection_or and channeltls. I _do_ like the idea of keeping some kind of abstraction barrier, though: a "channel" is "whatever we can send and receive cells from", whereas an "or_connection" has a lot of other baggage that comes with it.
From my POV, we *should* definitely abolish the channels' queues, and
minimize the amount of logic that channels do on their own. I'm not sure if we should rip them out entirely, or just simplify them a lot. I don't think either necessarily simpler or less bug-prone than the other.
Perhaps we should sketch out what the new interface would look like? Or maybe do an hour or two worth of exploratory hacking on each approach?
(This reminds me of another change I want someday, which is splitting edge_connection_t into an "edge_connection" type that implements a "stream" interface: right now, we have quite a few streams that aren't actually edge connections, but which use the type anyway.)
* Dealing with the DESTROY cell design issue will require a bit more tricky
work I think. We must not starve circuit with a DESTROY cell pending to be sent else the other side keeps sending data. But we should also not starve all the circuits because if we ever need to send a gazillion DESTROY cell in priority, we'll make the relay useless (DoS vector).
The question is, do we trust our EWMA policy to be wise enough to pick the circuit in a reasonable amount of time so we can flush the DESTROY cell from the circuit queue? Or we really need to bypass or prioritize somehow that cell in order to send them asap in order to avoid load on the network because the other side of the circuit is still sending?
So, elsewhere in the thread, folks have been discussing whether a circuit that's going to send a DESTROY cell should flush its pending cells first.
The answer is sadly, "it depends".
Case 1: Suppose Alice is downloading a webpage. Suppose we are the middle relay and we lose our connection to the exit. It would be nice to keep flushing the data we have towards Alice -- maybe. If she can use partial data. But any data that Alice sends to us would be lost, so it would be good if we had some way to tell Alice "stop sending please".
Case 2: Suppose Alice is uploading something to a webserver. Suppose we are the middle relay and we lose our connection from Alice. In this case, there's no point in sending any more data towards the webserver before we send it a DESTROY cell. (Even if Alice was in the middle of a big upload, she'll need to repeat any part of it that wasn't ACKed, since she won't know what was received and what wasn't.)
Case 3: Suppose we hit our OOM killer. In this case, we had better discard all the data on the circuit we're killing, or we're vulnerable to "sniper attacks" again.
So it's clear that sometimes we should dump the data, and sometimes we shouldn't. I think this is an independent question from what we're asking here. (My own take is that solving case 1 right requires "RELAY_TRUNCATED" cells, which I believe we don't implement today.)
What we're asking here is: how can we reintegrate DESTROY cells with the rest of the scheduler logic?
I think that, from a priority POV, DESTROY cells are in a certain sense the _opposite_ of traffic, and we might actually want to treat them differently from data cells. Consider that if we have a choice between DESTROYing a busy circuit or a quiet one, we will save more bandwidth by destroying the busy circuit first, so that no more data is sent to us over it.
On the other hand, this doesn't mean that the FIFO structure we have today is a good idea at all. It probably makes sense to use the same priority queue-based scheduler thing that we use everywhere else, but possibly with a different (inverted??) priority parameter for destroyed circuits.
One more thing to be aware of: the destroy_cell_queue exists in part because we tear down circuits at the same time that we queue their destroy cells. If we changed Tor so that "destroyed" circuits were kept around somehow until their cells could be sent, then we'd be introducing a new state to our state machine, to represent circuits that were schedulable but not actually usable for traffic. We'd need to be careful to handle that correctly: this kind of "unusable object that still exists" has caused us problems before. (The solution I like best for avoiding this confusion is to make it so the scheduler can schedule two types of "schedule-able" things: circuits, and "pending destroy cells".)
- In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to
channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
I agree with Roger here: it's fine to throw away the vanilla scheduler, but we should wait until KIST has been running unproblematically in a stable release for a while. 0.3.4 seems like a good time for this.
On 15 Nov (13:49:54), Nick Mathewson wrote:
On Mon, Oct 30, 2017 at 3:57 PM, David Goulet dgoulet@ev0ke.net wrote:
[snip]
Through this epic journey, we've discovered some issues as well as design problems. Now the question is what should and can do about it?
In a nutshell, there are a couple of questions we should ask our selfves and try to answer so we can move forward:
- I believe now that we should seriously discuss the relevance of
channels. Originally, the idea was good that is providing an abstraction layer for the relay to relay handshake and send/process cells related to the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not properly implemented/finished (channel abstraction) and also something that is unused. An abstraction implemented only for one thing is not really useful except maybe to offer an example for others? But we aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to turn it in a nicer interface, fixing all the issues I've described above (and I suspect there might be more) so the cell scheduler can play nicely with channels. Or, we could rip them off eliminating lots of code and reducing our technical debt. I would like us to think about what we want seriously because that channel subsystem is _complicated_ and very few of us fully understands it afaict.
Which would bring us back to (which is btw basically what we have now considering the channel queues are useless):
conn inbuf -> circ queue -> conn outbuf
If we don't want to get rid of channel, the fixes are non trivial. For starter, we have to decide if we want to keep the channel queue or not and if yes, we need to almost start from square 1 in terms of testing because we would basically introduce a new layer of queuing cells.
So, this is the question I'm least sure about. Please take the following as tentative.
I think that the two choices ("refactor channels" and "rip out channels") may be less different than we think. Neither one is going to be trivial to do, and we shouldn't assume that sticking everything together into one big type will actually make the code _simpler_.
The way I think about the code right now, "channel" is an interface which "connection_or" implements, and there is no meaningful barrier between connection_or and channeltls. I _do_ like the idea of keeping some kind of abstraction barrier, though: a "channel" is "whatever we can send and receive cells from", whereas an "or_connection" has a lot of other baggage that comes with it.
From my POV, we *should* definitely abolish the channels' queues, and minimize the amount of logic that channels do on their own. I'm not sure if we should rip them out entirely, or just simplify them a lot. I don't think either necessarily simpler or less bug-prone than the other.
Perhaps we should sketch out what the new interface would look like? Or maybe do an hour or two worth of exploratory hacking on each approach?
I tend to agree here with you. I did the exercise already to remove the channel queues and I ran a relay for weeks on it without any apparent problems. It removed 1000+ lines and made everything simpler.
So, I do think having the "channel" abstraction is good and both in terms of future use of channels but also in terms of maintanability.
That being said, I think the next move for me will be to act on removing the channel queues and try to decouple channeltls and connection_or so we can end up with a real abstraction. Achieving this will provide us some more encapsulation between subsystem and thus be able to provide strong guarantees between them a.k.a between channel and scheduler.
I do think the current channel interface is "ok" for now but as we do this exercise and refactoring, we'll discover more things which will just improve the subsystem overall.
(This reminds me of another change I want someday, which is splitting edge_connection_t into an "edge_connection" type that implements a "stream" interface: right now, we have quite a few streams that aren't actually edge connections, but which use the type anyway.)
- Dealing with the DESTROY cell design issue will require a bit more tricky
work I think. We must not starve circuit with a DESTROY cell pending to be sent else the other side keeps sending data. But we should also not starve all the circuits because if we ever need to send a gazillion DESTROY cell in priority, we'll make the relay useless (DoS vector).
The question is, do we trust our EWMA policy to be wise enough to pick the circuit in a reasonable amount of time so we can flush the DESTROY cell from the circuit queue? Or we really need to bypass or prioritize somehow that cell in order to send them asap in order to avoid load on the network because the other side of the circuit is still sending?
So, elsewhere in the thread, folks have been discussing whether a circuit that's going to send a DESTROY cell should flush its pending cells first.
The answer is sadly, "it depends".
Case 1: Suppose Alice is downloading a webpage. Suppose we are the middle relay and we lose our connection to the exit. It would be nice to keep flushing the data we have towards Alice -- maybe. If she can use partial data. But any data that Alice sends to us would be lost, so it would be good if we had some way to tell Alice "stop sending please".
Case 2: Suppose Alice is uploading something to a webserver. Suppose we are the middle relay and we lose our connection from Alice. In this case, there's no point in sending any more data towards the webserver before we send it a DESTROY cell. (Even if Alice was in the middle of a big upload, she'll need to repeat any part of it that wasn't ACKed, since she won't know what was received and what wasn't.)
Case 3: Suppose we hit our OOM killer. In this case, we had better discard all the data on the circuit we're killing, or we're vulnerable to "sniper attacks" again.
So it's clear that sometimes we should dump the data, and sometimes we shouldn't. I think this is an independent question from what we're asking here. (My own take is that solving case 1 right requires "RELAY_TRUNCATED" cells, which I believe we don't implement today.)
For the record, right now in tor, whatever (1) or (2), when the middle relay decides it needs to send a DESTROY, circuit queue are cleared (nothing is sent) and a DESTROY is sent very fast.
What we're asking here is: how can we reintegrate DESTROY cells with the rest of the scheduler logic?
I think that, from a priority POV, DESTROY cells are in a certain sense the _opposite_ of traffic, and we might actually want to treat them differently from data cells. Consider that if we have a choice between DESTROYing a busy circuit or a quiet one, we will save more bandwidth by destroying the busy circuit first, so that no more data is sent to us over it.
On the other hand, this doesn't mean that the FIFO structure we have today is a good idea at all. It probably makes sense to use the same priority queue-based scheduler thing that we use everywhere else, but possibly with a different (inverted??) priority parameter for destroyed circuits.
(We kind of need the FIFO concept for cells afaict because of the parent relationship between cells with their digest (à la git). And that is of course per circuit.)
One more thing to be aware of: the destroy_cell_queue exists in part because we tear down circuits at the same time that we queue their destroy cells. If we changed Tor so that "destroyed" circuits were kept around somehow until their cells could be sent, then we'd be introducing a new state to our state machine, to represent circuits that were schedulable but not actually usable for traffic. We'd need to be careful to handle that correctly: this kind of "unusable object that still exists" has caused us problems before. (The solution I like best for avoiding this confusion is to make it so the scheduler can schedule two types of "schedule-able" things: circuits, and "pending destroy cells".)
After some thoughts, yes I do think you are right here. For the two reasons you mentionned:
1. DESTROY cell needs to be considered differently in terms of priority depending on the scheduler. KIST for instance, we might think that we want to bypass the kernel limit and just dump it asap. But we would also need to select the active circuit using a different priority policy (inverted) as you suggested to prioritize busy circuit.
2. We can't keep circuit objects alive while we wait for the destroy cell, the logic inside tor might become complicated and kind of also opens a way for memory DoS if we ever have a DESTROY cell starvation problem in tor.
Then the problem becomes how do those two queues interact with each other (normal and DESTROY cells). The issue at hands becomes possible starvation.
Right now, Tor does a back and forth between sending a DESTROY cell and a normal cell (regardless of the circuit, actually never for the same circuit). The destroy cell queue is a FIFO which is not ideal with what we've discussed but overtime we'll make sure to avoid starvation because DESTROY cells aren't picked by priority.
Thus, if the DESTROY queue becomes a priority queue, starvation becomes a problem. And extra difficulty, the priority in the normal cell queue can't be compared to the one of the destroy cell queue because the former evolves over time and the later is frozen in time because the circuit simply doesn't exists anymore. In other words, we can't consider the two queues' priorities within the same policy.
One avenue of solution to this is having some sort of "throtlling" or a "threshold" that the scheduller allows a queue to be emptied so the other queue doesn't starve. Chances are that the threshold in our cases would be a maximum number of cells before the destroy cell queue yields its time back to the normal cell queue which also would need a threshold. And to be efficient there, good chance that threshold would be adaptative of the load of the relay.
All in all, this ain't an easy problem so simply switching the DESTROY cell queue to a priority one will need important changes.
We are getting in the land of proposal so I'll stop for now, do some more homework in the starvation problem in scheduling (vis-a-vis routing) and write something up in a proposal so we can go from there. But at least now, we've come up with some agreement and more data points on how DESTROY cells needs to be handled in Tor.
- In the short term, we should get rid of Vanilla scheduler because it complefixies a lot the scheduler code by adding uneeded things to
channel_t but also bloated the scheduler interface with pointless function pointers for instance. And in my opinion, it is not helping performance the way it is done right now.
I agree with Roger here: it's fine to throw away the vanilla scheduler, but we should wait until KIST has been running unproblematically in a stable release for a while. 0.3.4 seems like a good time for this.
Agree.
Thanks! David
-- Nick
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Thu, Nov 16, 2017 at 8:56 AM, David Goulet dgoulet@torproject.org wrote:
On 15 Nov (13:49:54), Nick Mathewson wrote:
[...]
On the other hand, this doesn't mean that the FIFO structure we have today is a good idea at all. It probably makes sense to use the same priority queue-based scheduler thing that we use everywhere else, but possibly with a different (inverted??) priority parameter for destroyed circuits.
(We kind of need the FIFO concept for cells afaict because of the parent relationship between cells with their digest (à la git). And that is of course per circuit.)
Are you sure? DESTROY cells aren't relay cells; they don't have relay crypto done to them, and I think it's okay to re-order them with respect to other cells. I don't think they have a digest on them, do they?
peace,
On 16 Nov (09:06:03), Nick Mathewson wrote:
On Thu, Nov 16, 2017 at 8:56 AM, David Goulet dgoulet@torproject.org wrote:
On 15 Nov (13:49:54), Nick Mathewson wrote:
[...]
On the other hand, this doesn't mean that the FIFO structure we have today is a good idea at all. It probably makes sense to use the same priority queue-based scheduler thing that we use everywhere else, but possibly with a different (inverted??) priority parameter for destroyed circuits.
(We kind of need the FIFO concept for cells afaict because of the parent relationship between cells with their digest (à la git). And that is of course per circuit.)
Are you sure? DESTROY cells aren't relay cells; they don't have relay crypto done to them, and I think it's okay to re-order them with respect to other cells. I don't think they have a digest on them, do they?
OH sorry I thought you were talking about normal circuit queue here... I mis-read.
But yes, as I mentionned in this email after, moving to a prio queue for instance has starvation implication.
Sorry! David
peace,
Nick _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev