[Nick, can you merge my torspec proposal218 branch, please? Thanks!]
On 2/28/13 2:08 AM, Rob Jansen wrote:
On Mon, Feb 25, 2013 at 10:28 AM, Karsten Loesing karsten@torproject.org wrote:
On 2/23/13 11:20 PM, Rob Jansen wrote:
On Feb 23, 2013 4:22 PM, "Karsten Loesing" karsten@torproject.org
wrote:
Your understanding of n_circ_id and p_circ_id matches mine, but are you sure there's a UID for circuits other than origin circuits? I think you mean origin_circuit_t->global_identifier. But there's no such field for or_circuit_t or circuit_t. Or do you mean something else?
Ok. Though I thought my original patch moved the global Id to the base circuit struct. Perhaps I didn't. Anyway, I'm not sure it matters...
Your original patch did move the ID to circuit_t, but I thought we wanted to avoid numbering non-origin circuits (mostly because that affects relays in non-TestingTorNetwork mode and could lead to busy relays running out of IDs at some point), which is why I took this change out.
Right, I remember this now. I could imagine ways to get around the 'running out of ids' problem, like resetting the id counter after a given timeframe. In fact, we could just let the counter overflow and loop back to 0 on its own (assuming its a unsigned int type) since we really only need them to be unique approximately for the life of a circuit. If we see the ID come up again after an hour, we can assume its a new circuit.
I'd rather not want to change behavior in non-simulation mode. If we really need a UID for non-origin circuits, we could add a separate uint64_t global_identifier to circuit_t and only use that in simulating mode. I'm not sure that we need it though.
Also, even if we moved the field to circuit_t, the new CELL_STATS events would be the only ones using these UIDs, because all other events are for origin circuits only. I don't see how these IDs would help us. We never learn what UID the next or previous node in a circuit picks for a given circuit.
Not currently, but couldn't we implement the functionality where relays log or export their circuit UIDs and next/prev IDs over the control port? Though I'm not sure if you'd want this in mainline Tor if its only useful in simulation mode...
We could have relays log local circuit UIDs together with p_circ_id, p_conn_id, n_circ_id, and n_conn_id in simulation mode. But I don't see how that would facilitate circuit tracking compared to the current approach using ORCONN and CELL_STATS events.
Here's an example:
- fileclient creates a circuit with - UID 14, - n_conn_id 15, and - n_circ_id 19403.
- tokenglobal is the first hop in this circuit and reports - (new) UID 12345, - p_conn_id 32, - p_circ_id 19403, - n_conn_id 18, and - n_circ_id 6710.
- tokenrelay is the second hop and reports - (new) UID 23456, - p_conn_id 17, - p_circ_id 6710, - n_conn_id 16, - n_circ_id 34402.
- exit2 is the third and final hop and reports - (new) UID 34567, - p_conn_id 15, and - p_circ_id 34402.
We need ORCONN events to know that fileclient's n_conn_id 15 is the same connection as tokenglobal's p_conn_id 32. How would the new UIDs make this easier?
tl;dr: I _think_ it's possible to reconstruct circuits from ORCONN and CELL_STATS events as they are currently specified in proposal 218.
Great, but do we really expect every Tor controller parser to get this right? It seems complicated enough that there should be an easier way. Maybe it's just wishful thinking on my part.
I agree that reconstructing circuits from ORCONN and CELL_STATS events is far from trivial. I don't really see how to make it simpler though.
What about linking all of the IDs and UIDs as described above?
(See above.)
From an earlier mail in this thread:
Finally, Rob, should I look into CIRC_BW events you suggested a while ago? If yes, what did you have in mind how that event would look like, and when/by whom would it be emitted?
If we want to do this, it would likely be an aggregation of all STREAM_BW events for a circuit, but only during the time when those streams
belonged
to that circuit. I don't think it makes sense to emit it for every STREAM_BW event though, so what if we aggregate and emit it once per second? A format similar to the STREAM_BW format should work fine.
Done. I specified and implemented such a CIRC_BW event.
Great!
Here's the updated proposal 218 (Nick, please don't merge this yet):
https://gitweb.torproject.org/user/karsten/torspec.git/blob/refs/heads/propo...
In section 5.3/5.4/5.5, these events are emitted in the second_elapsed_callback(), right? I wanted to verify that a relay who hasn't sent anything in a few seconds and then starts sending again will emit the event at the end of the second after which it resumed sending, rather than the first bytes after it resumed sending.
Yes, all these events are emitted in second_elapsed_callback().
The last word in 5.4 should be 'reading' instead of 'read'.
Fixed.
Is the specification of 5.5 and 5.6 complex enough to warrant including example outputs?
Sure, can't hurt. Added a few examples.
In 5.6, were we planning on explaining how buckets can go negative and how that affects the reporting of the TB_EMPTY events?
I tried a better explanation:
""" ReadBucketEmpty (WriteBucketEmpty) is the time in millis that the read (write) bucket was empty since the last refill. LastRefill is the time in millis since the last refill.
If a bucket went negative and if refilling tokens didn't make it go positive again, there will be multiple consecutive TB_EMPTY events for each refill interval during which the bucket contained zero tokens or less. In such a case, ReadBucketEmpty or WriteBucketEmpty are capped at LastRefill in order not to report empty times more than once. """
Here's the tor branch:
https://gitweb.torproject.org/karsten/tor.git/shortlog/refs/heads/morestats3
Here's a Shadow log file containing all new events:
https://people.torproject.org/~karsten/volatile/scallion.log.gz
Here's the Java program that I used to parse the Shadow log file:
https://people.torproject.org/~karsten/volatile/ParseProposal218Events.java
Ugh. I really don't look forward to writing parsers for this. I guess there may be few projects that actually require this information, and those that do can use your code:)
I hope that some of the complexity goes away once we use a parsing library like Stem. And I think we should provide parsing code to other people looking into this.
And finally, here's the output, which should be easier to understand now:
https://people.torproject.org/~karsten/volatile/scallion.txt
Search for "Circuit [fileclient-60.1.0.0]:14" to find the circuit I mentioned earlier in this thread.
Can you review the proposal changes and tell me if they make sense to you?
Best, Karsten
See comments above. Overall, it looks good. I'm still wondering about the UIDs / IDs issue. It may be that we don't want to include that for other reasons, in which case the current implementation is fine.
Okay, in that case let's consider proposal 218 done for the moment, unless we come up with a better idea to solve circuit-tracking thing.
I asked Nick above to merge my changes, so that I can put proposal 218 on the sponsor F year 3 wiki page as one result of the February 28 milestone. (That doesn't mean we cannot make it even better after February.)
Thanks! Karsten