Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
- My main goal was to understand the prop224 sections [TIME-PERIODS] and [TIME-OVERLAP].
Those sections specify a system where hidden services decide in a probabilistic manner _when_ to publish their descriptor so that not all hidden services publish their descriptors at the same moment and cause a thundering herd that stampedes the network.
For this to work, time is split into time periods of k hours each. A few hours before each time period, there is an overlap period where hidden services start publishing their _next_ descriptors to HSDirs, so that when the upcoming time period starts, all the HSDirs have already received the descriptors and are ready to serve them.
Consider the overlap period at the end of time period #N. During that overlap period, hidden services publish their descriptors for future time period #N+1. In this case, hidden services also need to know the shared random value that will be active during time period #N+1, since it needs to be used to find the responsible HSDirs. This means, that the shared random value for time period #N+1 needs to be published _before_ the overlap period starts.
This is not the case in current proposal 224, since time is split into time periods of 25 hours, which means that each day the start time shifts by one hour forward. Since the start/end times of the time periods keep on shifting, there will be cases where the right shared random value will not be accessible when the overlap period starts.
So what to do?
To fix this, I suggest we change the time period length to a day (24 hours).
I also suggest we start time periods every day at 12:00 and finish after 24 hours same time, so that it works well with the current shared randomness schedule (where the new shared random value gets published at 00:00 every day). [It might actually be wiser to actually reverse those schedules: create the SRV at 12:00 and start the time period at 00:00]
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+
Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
So, this basically gives a space of 12 hours between the SRV generation and the start of the next time period. We can then easily fit an overlap period of 6 hours before the next time periods starts. In the above diagram, the "equal sign" segments are the overlap periods. 'overlap12' is the overlap period from TP#1 to TP#2.
Do you think that's reasonable? And do you see any problems with changing the time period length from 25 hours to 24 hours?
- So now that we have ironed out the time period stuff slightly, let's discuss the behavior that hidden services, clients and HSDirs should inherit.
This email is quite long already so I'm going to go with examples, instead of formal specification. However, this stuff needs to go formally in the proposal IMO, so any help in formalizing it would be great.
+ Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
+ Client behavior
My current intuition with regards to client behavior is that they should always fetch descriptors from the HSDirs of the _current_ time period. They should not concern themselves with the overlap stuff _at all_. The overlap system is there so that by the time the new time period starts, all the HSDirs have received the descriptors and are ready to help the clients. Clients should never notice the overlap stuff happening.
For this reason I think we can remove this paragraph from the spec:
When a client is looking for a service, it must calculate its key both for the current and for the subsequent period, to decide whether the next period's key is valid yet.
What do you think?
+ HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
And this half-assedly sums up the behaviors of clients/HSes and HSDirs with regards to descriptor uploads and downloads. What is missing, and do you agree that parts of this should be in the proposal?
- We should revert the torspec commit: "prop224: avoid replicas with the same blinded key" https://gitweb.torproject.org/torspec.git/commit/?id=8df8c0584392240aa8fecbc...
It adds a whole lot of complexity to prop224 with no clear security benefit against realistic adversaries. Furthermore, the time period and descriptor download/upload logic of Tor gets very complicated with it.
I discussed this with teor and special and found it reasonable.
- The randomized revision-counter logic should also be simplified or even removed: https://gitweb.torproject.org/torspec.git/commit/?id=01119bf1291a40aa309dfb7...
I haven't looked much into this yet. If someone has thoughts please let me know.
- We should use fresh salt every time we rebuild the descriptor, but not for every replica: https://gitweb.torproject.org/torspec.git/commit/?id=01e865d592ffcbb67a0e663...
- teor says we should revert the double hashing here, and just use tor's random API: https://gitweb.torproject.org/torspec.git/commit/?id=93f47f4f4e7614d4b3debfe...
peace