Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
- My main goal was to understand the prop224 sections [TIME-PERIODS] and [TIME-OVERLAP].
Those sections specify a system where hidden services decide in a probabilistic manner _when_ to publish their descriptor so that not all hidden services publish their descriptors at the same moment and cause a thundering herd that stampedes the network.
For this to work, time is split into time periods of k hours each. A few hours before each time period, there is an overlap period where hidden services start publishing their _next_ descriptors to HSDirs, so that when the upcoming time period starts, all the HSDirs have already received the descriptors and are ready to serve them.
Consider the overlap period at the end of time period #N. During that overlap period, hidden services publish their descriptors for future time period #N+1. In this case, hidden services also need to know the shared random value that will be active during time period #N+1, since it needs to be used to find the responsible HSDirs. This means, that the shared random value for time period #N+1 needs to be published _before_ the overlap period starts.
This is not the case in current proposal 224, since time is split into time periods of 25 hours, which means that each day the start time shifts by one hour forward. Since the start/end times of the time periods keep on shifting, there will be cases where the right shared random value will not be accessible when the overlap period starts.
So what to do?
To fix this, I suggest we change the time period length to a day (24 hours).
I also suggest we start time periods every day at 12:00 and finish after 24 hours same time, so that it works well with the current shared randomness schedule (where the new shared random value gets published at 00:00 every day). [It might actually be wiser to actually reverse those schedules: create the SRV at 12:00 and start the time period at 00:00]
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+
Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
So, this basically gives a space of 12 hours between the SRV generation and the start of the next time period. We can then easily fit an overlap period of 6 hours before the next time periods starts. In the above diagram, the "equal sign" segments are the overlap periods. 'overlap12' is the overlap period from TP#1 to TP#2.
Do you think that's reasonable? And do you see any problems with changing the time period length from 25 hours to 24 hours?
- So now that we have ironed out the time period stuff slightly, let's discuss the behavior that hidden services, clients and HSDirs should inherit.
This email is quite long already so I'm going to go with examples, instead of formal specification. However, this stuff needs to go formally in the proposal IMO, so any help in formalizing it would be great.
+ Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
+ Client behavior
My current intuition with regards to client behavior is that they should always fetch descriptors from the HSDirs of the _current_ time period. They should not concern themselves with the overlap stuff _at all_. The overlap system is there so that by the time the new time period starts, all the HSDirs have received the descriptors and are ready to help the clients. Clients should never notice the overlap stuff happening.
For this reason I think we can remove this paragraph from the spec:
When a client is looking for a service, it must calculate its key both for the current and for the subsequent period, to decide whether the next period's key is valid yet.
What do you think?
+ HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
And this half-assedly sums up the behaviors of clients/HSes and HSDirs with regards to descriptor uploads and downloads. What is missing, and do you agree that parts of this should be in the proposal?
- We should revert the torspec commit: "prop224: avoid replicas with the same blinded key" https://gitweb.torproject.org/torspec.git/commit/?id=8df8c0584392240aa8fecbc...
It adds a whole lot of complexity to prop224 with no clear security benefit against realistic adversaries. Furthermore, the time period and descriptor download/upload logic of Tor gets very complicated with it.
I discussed this with teor and special and found it reasonable.
- The randomized revision-counter logic should also be simplified or even removed: https://gitweb.torproject.org/torspec.git/commit/?id=01119bf1291a40aa309dfb7...
I haven't looked much into this yet. If someone has thoughts please let me know.
- We should use fresh salt every time we rebuild the descriptor, but not for every replica: https://gitweb.torproject.org/torspec.git/commit/?id=01e865d592ffcbb67a0e663...
- teor says we should revert the double hashing here, and just use tor's random API: https://gitweb.torproject.org/torspec.git/commit/?id=93f47f4f4e7614d4b3debfe...
peace
On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
My main goal was to understand the prop224 sections [TIME-PERIODS] and [TIME-OVERLAP].
Those sections specify a system where hidden services decide in a probabilistic manner _when_ to publish their descriptor so that not all hidden services publish their descriptors at the same moment and cause a thundering herd that stampedes the network.
For this to work, time is split into time periods of k hours each. A few hours before each time period, there is an overlap period where hidden services start publishing their _next_ descriptors to HSDirs, so that when the upcoming time period starts, all the HSDirs have already received the descriptors and are ready to serve them.
Consider the overlap period at the end of time period #N. During that overlap period, hidden services publish their descriptors for future time period #N+1. In this case, hidden services also need to know the shared random value that will be active during time period #N+1, since it needs to be used to find the responsible HSDirs. This means, that the shared random value for time period #N+1 needs to be published _before_ the overlap period starts.
This is not the case in current proposal 224, since time is split into time periods of 25 hours, which means that each day the start time shifts by one hour forward. Since the start/end times of the time periods keep on shifting, there will be cases where the right shared random value will not be accessible when the overlap period starts.
So what to do?
To fix this, I suggest we change the time period length to a day (24 hours).
I also suggest we start time periods every day at 12:00 and finish after 24 hours same time, so that it works well with the current shared randomness schedule (where the new shared random value gets published at 00:00 every day). [It might actually be wiser to actually reverse those schedules: create the SRV at 12:00 and start the time period at 00:00]
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+
Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
So, this basically gives a space of 12 hours between the SRV generation and the start of the next time period. We can then easily fit an overlap period of 6 hours before the next time periods starts. In the above diagram, the "equal sign" segments are the overlap periods. 'overlap12' is the overlap period from TP#1 to TP#2.
Do you think that's reasonable? And do you see any problems with changing the time period length from 25 hours to 24 hours?
I don't see the reason why 25 hours would be helpful _especially_ considering how the shared random value is generated (every 24h). This all looks good!
So now that we have ironed out the time period stuff slightly, let's discuss the behavior that hidden services, clients and HSDirs should inherit.
This email is quite long already so I'm going to go with examples, instead of formal specification. However, this stuff needs to go formally in the proposal IMO, so any help in formalizing it would be great.
Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
It sounds simple as:
"If we are before to the overlap period, use the time period shared random value (TP1 == SRV1). If we are in the overlap period, upload two descriptors using _both_ SRVs."
Plausible?
Client behavior
My current intuition with regards to client behavior is that they should always fetch descriptors from the HSDirs of the _current_ time period. They should not concern themselves with the overlap stuff _at all_. The overlap system is there so that by the time the new time period starts, all the HSDirs have received the descriptors and are ready to help the clients. Clients should never notice the overlap stuff happening.
100% agreed.
Clock skew though might bring reachability issue where the client tries descriptor #1 but it's been an hour that the #2 is suppose to be used (TP2). But, we can probably solve that by having the HS keep its IPs open for the descriptor #1 for a period of X hours to accomodate those confused clients.
(I bet X could be between 4 to 6 hours at best. Altough, I have no clue how much a client can function with that big of a skew.)
Anyway, the point is that it's not the cliet job to adjust imo.
For this reason I think we can remove this paragraph from the spec: When a client is looking for a service, it must calculate its key both for the current and for the subsequent period, to decide whether the next period's key is valid yet. What do you think?
Rip it off :).
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
And this half-assedly sums up the behaviors of clients/HSes and HSDirs with regards to descriptor uploads and downloads. What is missing, and do you agree that parts of this should be in the proposal?
We should revert the torspec commit: "prop224: avoid replicas with the same blinded key" https://gitweb.torproject.org/torspec.git/commit/?id=8df8c0584392240aa8fecbc...
It adds a whole lot of complexity to prop224 with no clear security benefit against realistic adversaries. Furthermore, the time period and descriptor download/upload logic of Tor gets very complicated with it.
I discussed this with teor and special and found it reasonable.
As a malicious HSDir, your only way of being able to identify a descriptor is by knowing it's .onion (or you magically have the HS keys but let's not consider that). Thus, technically, you can already know who are the other HSDirs.
So I'm guessing here that was only for an attacker that runs X (> 1) HSDirs and wanted to know which descriptor is from the same HS. Not sure how that can be relevant to an attacker _without_ knowing the .onion.
I'm fine with removing it.
The randomized revision-counter logic should also be simplified or even removed: https://gitweb.torproject.org/torspec.git/commit/?id=01119bf1291a40aa309dfb7...
I haven't looked much into this yet. If someone has thoughts please let me know.
We should use fresh salt every time we rebuild the descriptor, but not for every replica: https://gitweb.torproject.org/torspec.git/commit/?id=01e865d592ffcbb67a0e663...
teor says we should revert the double hashing here, and just use tor's random API: https://gitweb.torproject.org/torspec.git/commit/?id=93f47f4f4e7614d4b3debfe...
Hrm... wait. This proposal will be turned into a specification so the salt is 16 _random_ bytes. However, the tor code does hash all the random bytes requested by default but that is implementation specific to tor.
So, in this case I think H(SALT) is ok because we have a comment explaining why it should be done like this (not expose raw bytes from our PRNG). We should leave it that way so anyone implementing it in let's say Java does the same that is hash the raw random bytes.
Cheers! David
peace _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 5 Apr 2016, at 02:54, David Goulet dgoulet@ev0ke.net wrote: So, this basically gives a space of 12 hours between the SRV generation and the start of the next time period. We can then easily fit an overlap period of 6 hours before the next time periods starts.
You've implicitly adjusted hsdir-overlap-begins to 75 here. I think that's ok, but it does need to be modified in the spec.
Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
It sounds simple as:
"If we are before to the overlap period, use the time period shared random value (TP1 == SRV1). If we are in the overlap period, upload two descriptors using _both_ SRVs."
Plausible?
Almost: it needs to say "overlap offset for the next blinded key" (the overlap varies based on the specific key).
Client behavior
My current intuition with regards to client behavior is that they should always fetch descriptors from the HSDirs of the _current_ time period. They should not concern themselves with the overlap stuff _at all_. The overlap system is there so that by the time the new time period starts, all the HSDirs have received the descriptors and are ready to help the clients. Clients should never notice the overlap stuff happening.
100% agreed.
Clock skew though might bring reachability issue where the client tries descriptor #1 but it's been an hour that the #2 is suppose to be used (TP2). But, we can probably solve that by having the HS keep its IPs open for the descriptor #1 for a period of X hours to accomodate those confused clients.
(I bet X could be between 4 to 6 hours at best. Altough, I have no clue how much a client can function with that big of a skew.)
Anyway, the point is that it's not the cliet job to adjust imo.
Clients can use a consensus and HS descriptors that are 24 hours out of date: NETWORKSTATUS_ALLOW_SKEW REND_CACHE_MAX_SKEW
So our skew should be at least that much.
For this reason I think we can remove this paragraph from the spec:
When a client is looking for a service, it must calculate its key both for the current and for the subsequent period, to decide whether the next period's key is valid yet.
What do you think?
Rip it off :).
It seems like an extra complication. I can't see how it helps clients to have 12 HSDirs to choose from for some random time between 0 and 6 hours each period.
(If we decide it does later, we can add the feature in a client update. We just need to make sure that HSDirs will answer queries for descriptors that aren't valid yet, which makes sense to do for client skew anyway.)
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Let's make it 24 + 24 + 6 = 54 hours instead, based on the 24 hour skew allowed for current clients. (See above.)
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
On 05 Apr (12:59:36), Tim Wilson-Brown - teor wrote:
On 5 Apr 2016, at 02:54, David Goulet dgoulet@ev0ke.net wrote: So, this basically gives a space of 12 hours between the SRV generation and the start of the next time period. We can then easily fit an overlap period of 6 hours before the next time periods starts.
You've implicitly adjusted hsdir-overlap-begins to 75 here. I think that's ok, but it does need to be modified in the spec.
Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
It sounds simple as:
"If we are before to the overlap period, use the time period shared random value (TP1 == SRV1). If we are in the overlap period, upload two descriptors using _both_ SRVs."
Plausible?
Almost: it needs to say "overlap offset for the next blinded key" (the overlap varies based on the specific key).
Client behavior
My current intuition with regards to client behavior is that they should always fetch descriptors from the HSDirs of the _current_ time period. They should not concern themselves with the overlap stuff _at all_. The overlap system is there so that by the time the new time period starts, all the HSDirs have received the descriptors and are ready to help the clients. Clients should never notice the overlap stuff happening.
100% agreed.
Clock skew though might bring reachability issue where the client tries descriptor #1 but it's been an hour that the #2 is suppose to be used (TP2). But, we can probably solve that by having the HS keep its IPs open for the descriptor #1 for a period of X hours to accomodate those confused clients.
(I bet X could be between 4 to 6 hours at best. Altough, I have no clue how much a client can function with that big of a skew.)
Anyway, the point is that it's not the cliet job to adjust imo.
Clients can use a consensus and HS descriptors that are 24 hours out of date: NETWORKSTATUS_ALLOW_SKEW
This doesn't seem to be used anywhere.
REND_CACHE_MAX_SKEW
This is imo way to big. https://trac.torproject.org/projects/tor/ticket/13207
We should take the opportunity of 224 implementation to come up with something that make sense imo. Could be 24h but I doubt it right now.
So our skew should be at least that much.
For this reason I think we can remove this paragraph from the spec:
When a client is looking for a service, it must calculate its key both for the current and for the subsequent period, to decide whether the next period's key is valid yet.
What do you think?
Rip it off :).
It seems like an extra complication. I can't see how it helps clients to have 12 HSDirs to choose from for some random time between 0 and 6 hours each period.
(If we decide it does later, we can add the feature in a client update. We just need to make sure that HSDirs will answer queries for descriptors that aren't valid yet, which makes sense to do for client skew anyway.)
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Let's make it 24 + 24 + 6 = 54 hours instead, based on the 24 hour skew allowed for current clients. (See above.)
I'm still doubtful :).
David
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+ Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
<snip>
So now that we have ironed out the time period stuff slightly, let's discuss the behavior that hidden services, clients and HSDirs should inherit.
This email is quite long already so I'm going to go with examples, instead of formal specification. However, this stuff needs to go formally in the proposal IMO, so any help in formalizing it would be great.
Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
It sounds simple as:
"If we are before to the overlap period, use the time period shared random value (TP1 == SRV1). If we are in the overlap period, upload two descriptors using _both_ SRVs."
Plausible?
I'm not sure it's so simple. As it is now, there is no indicator connecting time periods with shared random values, so "TP1 == SRV1" might make sense to us but it's not something that can be implemented. How does the client know whether to use "shared-rand-previous-value" or "shared-rand-current-value"?
Here is an idea:
"A hidden service uploading its normal descriptor using a consensus with valid-after between 12:00UTC (inclusive) and 00:00UTC (exlusive), uses the _current_ SRV. A hidden service uploading its normal descriptor using a consensus with valid-after between 00:00UTC (inclusive) and 12:00UTC (exclusive), uses the _previous_ SRV.
A hidden service uploading its overlap descriptor, always uses the current SRV (assumming that the HS descriptor overlap period starts after midnight UTC)."
And the client equivalent:
"A client fetching a hidden service descriptor using a consensus with valid-after between 12:00UTC (inclusive) and 00:00UTC (exclusive), uses the _current_ SRV. A client fetching a hidden service descriptor using a consensus with valid-after between 00:00UTC (inclusive) and 12:00UTC (exclusive), uses the _previous_ SRV."
In both sections above, if the right SRV is missing from the consensus, entities are supposed to use a fallback SRV value generated as specified in section 2.3.1 of prop224.
FWIW, I don't like how I had to use hardcoded time values in the above sections. That's because 12:00UTC is the $TIME_PERIOD_ROTATION_TIME and 00:00UTC is the $SHARED_RANDOM_VALUE_GENERATION_TIME. Maybe we could do this without hardcoding $SHARED_RANDOM_VALUE_GENERATION_TIME, by adding expiration times to the SRVs in the consensus and using those to choose the right SRV.
How else could we simplify this logic?
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
Also consider that whatever "maximum acceptable clock skew" we choose, the hidden service needs to keep its introduction circuits up for that time as well, otherwise the descriptor will be useless to the clock skewed clients.
---
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services. I see the following two negatives here:
- Hidden services need to retain their old intro circuits for the duration of the acceptable clock skew. - HSDirs need to cache hidden service descriptors for the duration of the acceptable clock skew.
Is there anything else I'm missing?
On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+ Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
<snip>
So now that we have ironed out the time period stuff slightly, let's discuss the behavior that hidden services, clients and HSDirs should inherit.
This email is quite long already so I'm going to go with examples, instead of formal specification. However, this stuff needs to go formally in the proposal IMO, so any help in formalizing it would be great.
Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we are nowhere close to the overlap period, so the hidden service should just publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as specified in [TIME-OVERLAP]) and schedule a time callback for publishing its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of the overlap period again, but this time the hidden service needs to use the SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the overlap period, so the hidden service should calculate its overlap OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act as Example 2, and _also_ publish its TP#2 descriptors to a second set of HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to formalize this in a way that can be written in the spec. Particularly, I'm not sure how to formalize which SRV to pick at a given time point.
It sounds simple as:
"If we are before to the overlap period, use the time period shared random value (TP1 == SRV1). If we are in the overlap period, upload two descriptors using _both_ SRVs."
Plausible?
I'm not sure it's so simple. As it is now, there is no indicator connecting time periods with shared random values, so "TP1 == SRV1" might make sense to us but it's not something that can be implemented. How does the client know whether to use "shared-rand-previous-value" or "shared-rand-current-value"?
Well, that's not entirely true. We know that TP and SRV have a 12h difference. You also know, with the consensus valid-after time, when the SRV was created. For instance, take the 03:00 valid-after consensus time, I can compute:
shared-rand-current-value: created 3 hours ago. shared-rand-previous-value: created 27 hours ago.
With the 12h shift between TP and SRV, it makes an SRV "lifetime" be 36 hours. Here how: TP1 uses SRV1 12h after the SRV1 creation and will stop using it 24h after thus 36h.
As a client, I get my 03:00 consensus and I want to know which SRV should I use. I know that the previous SRV is 27 hours old which is < 36h so I should use it.
New example, as a client, I get the 12:00 consensus, I can compute the following:
shared-rand-current-value: created 12 hours ago. shared-rand-previous-value: created 36 hours ago.
The following doesn't match: previous SRV is 36 hours old < 36h lifetime needed for the TP. So, I use the current SRV (in our example SRV2 for the TP2).
(For the HS, you would simply need to take into account the overlap period and use both SRV).
Here is an idea:
"A hidden service uploading its normal descriptor using a consensus with valid-after between 12:00UTC (inclusive) and 00:00UTC (exlusive), uses the _current_ SRV. A hidden service uploading its normal descriptor using a consensus with valid-after between 00:00UTC (inclusive) and 12:00UTC (exclusive), uses the _previous_ SRV.
A hidden service uploading its overlap descriptor, always uses the current SRV (assumming that the HS descriptor overlap period starts after midnight UTC)."
And the client equivalent:
"A client fetching a hidden service descriptor using a consensus with valid-after between 12:00UTC (inclusive) and 00:00UTC (exclusive), uses the _current_ SRV. A client fetching a hidden service descriptor using a consensus with valid-after between 00:00UTC (inclusive) and 12:00UTC (exclusive), uses the _previous_ SRV."
In both sections above, if the right SRV is missing from the consensus, entities are supposed to use a fallback SRV value generated as specified in section 2.3.1 of prop224.
FWIW, I don't like how I had to use hardcoded time values in the above sections. That's because 12:00UTC is the $TIME_PERIOD_ROTATION_TIME and 00:00UTC is the $SHARED_RANDOM_VALUE_GENERATION_TIME. Maybe we could do this without hardcoding $SHARED_RANDOM_VALUE_GENERATION_TIME, by adding expiration times to the SRVs in the consensus and using those to choose the right SRV.
How else could we simplify this logic?
It seems simple enough. Maybe the algorithm I sketched out above makes it simpler? Maybe not!... It's basically the _same_ end results as you.
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
- TIME_PERIOD_ROTATION_TIME (currently 12:00)
- TIME_PERIOD_[LIFETIME | SPAN | DURATION] (currently 24h)
- SHARED_RANDOM_VALUE_[CREATION | ROTATION]_TIME (currently 00:00)
- SHARED_RANDOM_VALUE_[LIFETIME | SPAN | DURATION] (currently 24h)
I doubt we can go simpler than that. Both algorithms have one single check ending in two outcomes that is either use previous or current.
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
I think we can re-upload only when needed that is key rotation, IP rotation, etc... No need to do that every hour (maybe).
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
The issue here is for the HSDir to notice that the HS might be gone? And we can't rely on RendPostPeriod value since it's service side. So an operator could litterally have set that to 7 hours meaning we might not see any new revision counter for that period and still unable to tell if the HS is gone or not.
This is why our best bet is to compute a "maximum crazy time" that descriptor could be valid.
An other option is to add a valid-until field in the cleartext part of the descriptor and the HSDir could use that to expire entries plus a clock skew delta.
Also consider that whatever "maximum acceptable clock skew" we choose, the hidden service needs to keep its introduction circuits up for that time as well, otherwise the descriptor will be useless to the clock skewed clients.
Yup! This is why I think above 6 hours of clock skewed you won't do much as a client... maybe even less!
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services. I see the following two negatives here:
- Hidden services need to retain their old intro circuits for the duration of the acceptable clock skew.
I pretty sure we don't do that currently. However, we could start doing that and collect stats on how frequent it is and with how much skew! That would be a very useful information to have imo.
- HSDirs need to cache hidden service descriptors for the duration of the acceptable clock skew.
Is there anything else I'm missing?
Cheers! David
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+ Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
<snip>
How else could we simplify this logic?
It seems simple enough. Maybe the algorithm I sketched out above makes it simpler? Maybe not!... It's basically the _same_ end results as you.
Yes, both approaches seem equivalent.
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
TIME_PERIOD_ROTATION_TIME (currently 12:00)
TIME_PERIOD_[LIFETIME | SPAN | DURATION] (currently 24h)
SHARED_RANDOM_VALUE_[CREATION | ROTATION]_TIME (currently 00:00)
SHARED_RANDOM_VALUE_[LIFETIME | SPAN | DURATION] (currently 24h)
I doubt we can go simpler than that. Both algorithms have one single check ending in two outcomes that is either use previous or current.
So, should we update prop250 and add SHARED_RANDOM_VALUE_CREATION_TIME and SHARED_RANDOM_VALUE_LIFETIME as consensus parameters?
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
I think we can re-upload only when needed that is key rotation, IP rotation, etc... No need to do that every hour (maybe).
Sounds good to me.
I wonder if there are any negatives to this behavior.
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
The issue here is for the HSDir to notice that the HS might be gone? And we can't rely on RendPostPeriod value since it's service side. So an operator could litterally have set that to 7 hours meaning we might not see any new revision counter for that period and still unable to tell if the HS is gone or not.
This is why our best bet is to compute a "maximum crazy time" that descriptor could be valid.
An other option is to add a valid-until field in the cleartext part of the descriptor and the HSDir could use that to expire entries plus a clock skew delta.
Yes, I also thought of adding a valid-until to the cleartext part of the descriptor so that its lifetime can be tweaked by the hidden service itself. Of course, HSDirs would also have a maximum desc lifetime that they would enforce.
I wonder if we should do this or maybe it's overengineering and a global non-configurable default lifetime is OK.
Also consider that whatever "maximum acceptable clock skew" we choose, the hidden service needs to keep its introduction circuits up for that time as well, otherwise the descriptor will be useless to the clock skewed clients.
Yup! This is why I think above 6 hours of clock skewed you won't do much as a client... maybe even less!
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services. I see the following two negatives here:
- Hidden services need to retain their old intro circuits for the duration of the acceptable clock skew.
I pretty sure we don't do that currently. However, we could start doing that and collect stats on how frequent it is and with how much skew! That would be a very useful information to have imo.
Yes sounds useful, although we should assume that skewed clients exist in general.
Collecting these statistics on the intro point side requires us to write a proper statistics patch and do the corresponding security analysis. Collecting these statistics on the hidden service side, requires us to write a non-trivial patch that implements this feature and also find volunteers with busy hidden services to run it. I wonder if it's worth it.
On 12 Apr 2016, at 20:47, George Kadianakis desnacked@riseup.net wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
TIME_PERIOD_ROTATION_TIME (currently 12:00)
TIME_PERIOD_[LIFETIME | SPAN | DURATION] (currently 24h)
SHARED_RANDOM_VALUE_[CREATION | ROTATION]_TIME (currently 00:00)
SHARED_RANDOM_VALUE_[LIFETIME | SPAN | DURATION] (currently 24h)
I doubt we can go simpler than that. Both algorithms have one single check ending in two outcomes that is either use previous or current.
So, should we update prop250 and add SHARED_RANDOM_VALUE_CREATION_TIME and SHARED_RANDOM_VALUE_LIFETIME as consensus parameters?
Do we intend to change the shared random value schedule if we change these parameters, or do we just have them for hidden services?
If we just have them for hidden services, then it's ok to hard-code them.
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
I think we can re-upload only when needed that is key rotation, IP rotation, etc... No need to do that every hour (maybe).
Sounds good to me.
I wonder if there are any negatives to this behaviour.
Fingerprintability.
See which services upload their descriptors when an intro point dies.
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
The issue here is for the HSDir to notice that the HS might be gone? And we can't rely on RendPostPeriod value since it's service side. So an operator could litterally have set that to 7 hours meaning we might not see any new revision counter for that period and still unable to tell if the HS is gone or not.
This is why our best bet is to compute a "maximum crazy time" that descriptor could be valid.
An other option is to add a valid-until field in the cleartext part of the descriptor and the HSDir could use that to expire entries plus a clock skew delta.
Yes, I also thought of adding a valid-until to the cleartext part of the descriptor so that its lifetime can be tweaked by the hidden service itself. Of course, HSDirs would also have a maximum desc lifetime that they would enforce.
I wonder if we should do this or maybe it's overengineering and a global non-configurable default lifetime is OK.
Again, fingerprintability. If a service has a non-default lifetime, then it's easy to find.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
On 12 Apr (20:58:43), Tim Wilson-Brown - teor wrote:
On 12 Apr 2016, at 20:47, George Kadianakis desnacked@riseup.net wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
TIME_PERIOD_ROTATION_TIME (currently 12:00)
TIME_PERIOD_[LIFETIME | SPAN | DURATION] (currently 24h)
SHARED_RANDOM_VALUE_[CREATION | ROTATION]_TIME (currently 00:00)
SHARED_RANDOM_VALUE_[LIFETIME | SPAN | DURATION] (currently 24h)
I doubt we can go simpler than that. Both algorithms have one single check ending in two outcomes that is either use previous or current.
So, should we update prop250 and add SHARED_RANDOM_VALUE_CREATION_TIME and SHARED_RANDOM_VALUE_LIFETIME as consensus parameters?
Do we intend to change the shared random value schedule if we change these parameters, or do we just have them for hidden services?
If we just have them for hidden services, then it's ok to hard-code them.
They would affect client and service if they change.
I can't see really why we would like to change those values in the future tbh apart from design bugs we don't know about... It's cheap to give us the ability to do so though since those shared random values affect the whole HS subsystems but also I'm sure will be used by third part for whatever use case.
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
I think we can re-upload only when needed that is key rotation, IP rotation, etc... No need to do that every hour (maybe).
Sounds good to me.
I wonder if there are any negatives to this behaviour.
Fingerprintability.
See which services upload their descriptors when an intro point dies.
Yeah... actually making RendPostPeriod configurable by operators in the first place seems to me a bad idea now that I think of it... (see my last email about this).
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
The issue here is for the HSDir to notice that the HS might be gone? And we can't rely on RendPostPeriod value since it's service side. So an operator could litterally have set that to 7 hours meaning we might not see any new revision counter for that period and still unable to tell if the HS is gone or not.
This is why our best bet is to compute a "maximum crazy time" that descriptor could be valid.
An other option is to add a valid-until field in the cleartext part of the descriptor and the HSDir could use that to expire entries plus a clock skew delta.
Yes, I also thought of adding a valid-until to the cleartext part of the descriptor so that its lifetime can be tweaked by the hidden service itself. Of course, HSDirs would also have a maximum desc lifetime that they would enforce.
I wonder if we should do this or maybe it's overengineering and a global non-configurable default lifetime is OK.
Again, fingerprintability. If a service has a non-default lifetime, then it's easy to find.
It leaks information about the HS... again see previous email. I might not bee too fan of adding this afterall.
Cheers! David
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+ Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
<snip>
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
- TIME_PERIOD_ROTATION_TIME (currently 12:00)
[Second email with some more thoughts]
BTW, currently in prop224 the TIME_PERIOD_ROTATION_TIME is at 00:00 because of the following paragraph:
Time periods start with the Unix epoch (Jan 1, 1970), and are computed by taking the number of whole minutes since the epoch and dividing by the time period. So if the current time is 2013-11-12 13:44:32 UTC, making the seconds since the epoch 1384281872, the number of minutes since the epoch is 23071364. If the current time period length is 1500 (the default), then the current time period number is 15380. It began 15380*1500*60 seconds after the epoch at 2013-11-11 20:00:00 UTC, and will end at (15380+1)*1500*60 seconds after the epoch at 2013-11-12 21:00:00 UTC.
I wonder what's the best way to change this to start at 12:00.
We could in theory compute the "number of whole minutes since the epoch plus 12 hours" and use that in the division, but that would be a bit ugly... Is there a more elegant thing to do?
We could also in theory change the shared random value generation to happen at 12:00, and then have TIME_PERIOD_ROTATION_TIME naturally start at 00:00, but this requires changing prop250. Could it be worth it? :/
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
I think we can re-upload only when needed that is key rotation, IP rotation, etc... No need to do that every hour (maybe).
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
The issue here is for the HSDir to notice that the HS might be gone? And we can't rely on RendPostPeriod value since it's service side. So an operator could litterally have set that to 7 hours meaning we might not see any new revision counter for that period and still unable to tell if the HS is gone or not.
This is why our best bet is to compute a "maximum crazy time" that descriptor could be valid.
An other option is to add a valid-until field in the cleartext part of the descriptor and the HSDir could use that to expire entries plus a clock skew delta.
So if HSDirs always keep descriptors for 36 hours, what happens if an HS rotates its intro points and publishes a new descriptor just an hour before the time period changes? Then the HSDir needs to keep and serve that descriptor for 36 hours, even if it will expire in an hour? Is this OK?
Could this be another point for doing the valid-until thing?
Currently the proposal says:
Hidden service directories should accept descriptors [...] and retain them for at least [TODO: how much?] minutes after the end of the period.
but that means that HSDirs need to keep track of when the period ends, and whether a descriptor was uploaded for the current time period or for the overlay period...
Also consider that whatever "maximum acceptable clock skew" we choose, the hidden service needs to keep its introduction circuits up for that time as well, otherwise the descriptor will be useless to the clock skewed clients.
Yup! This is why I think above 6 hours of clock skewed you won't do much as a client... maybe even less!
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services. I see the following two negatives here:
- Hidden services need to retain their old intro circuits for the duration of the acceptable clock skew.
I pretty sure we don't do that currently. However, we could start doing that and collect stats on how frequent it is and with how much skew! That would be a very useful information to have imo.
Could this somehow cause #16702? Probably not...
On 12 Apr (16:01:32), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
<snip>
In any case, this is how this might look like:
+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $ |-----------$-----======|-----------$-----======| | | overlap12 overlap23 | | | +------------------------------------------------------------------+ Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1]
<snip>
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
- TIME_PERIOD_ROTATION_TIME (currently 12:00)
[Second email with some more thoughts]
BTW, currently in prop224 the TIME_PERIOD_ROTATION_TIME is at 00:00 because of the following paragraph:
Time periods start with the Unix epoch (Jan 1, 1970), and are computed by taking the number of whole minutes since the epoch and dividing by the time period. So if the current time is 2013-11-12 13:44:32 UTC, making the seconds since the epoch 1384281872, the number of minutes since the epoch is 23071364. If the current time period length is 1500 (the default), then the current time period number is 15380. It began 15380*1500*60 seconds after the epoch at 2013-11-11 20:00:00 UTC, and will end at (15380+1)*1500*60 seconds after the epoch at 2013-11-12 21:00:00 UTC.
I wonder what's the best way to change this to start at 12:00.
We could in theory compute the "number of whole minutes since the epoch plus 12 hours" and use that in the division, but that would be a bit ugly... Is there a more elegant thing to do?
We could also in theory change the shared random value generation to happen at 12:00, and then have TIME_PERIOD_ROTATION_TIME naturally start at 00:00, but this requires changing prop250. Could it be worth it? :/
I think it's fine we keep the start time at 12:00 here. It's just an offset from the start of the epoch. Furthermore, adding a "rotation time" makes it that we we can control where everything started which doesn't have to be the epoch time at 00:00.
We can find the start of the TP with those two (rotation time and lifetime) and then divide that time value by the lifetime to get the nth time period.
Also, controling the rotation time is good to have for chutney testing with much more smaller timings.
<snip>
HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO: how much?] minutes before they would become valid, and retain them for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of that paragraph and not imposing any such weak restrictions for accepting descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is how long descriptors should be retained after the end of the period. We currently think clock skew is the only thing that can bring clients to the wrong HSDir after the end of the period. Maybe an hour is OK? David suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
It should at least be 24 hours (maximum possible) with an adjustment of at the _very_ least the overlap period. If the overlap period is 6 hours, we can then add the "maximum clock skew" we think is reasonable and we would end up with an OK value imo.
Descriptor maximum lifetime: 24 hours Overlap period span: 6 hours (taken from your diagram) Maximum acceptable clock skew: 6 hours (dgoulet opinion!)
Thus we are talking of a 36 hours lifetime in the cache. Let's work with that as a baseline :).
Hm, I see you are calculating the total lifetime here. How often do hidden services refresh (reupload) their descriptor in this case? I think in the current system, hidden services do so every hour. Do we keep this feature?
I think we can re-upload only when needed that is key rotation, IP rotation, etc... No need to do that every hour (maybe).
Let's consider a hidden service that uploads a single descriptor during its overlap period and then disappears completely: should the HSDir keep and serve that descriptor for 36 hours? It's unlikely that the HS is still up and maintaining its intro circuits if it can't keep on refreshing its descriptor.
The issue here is for the HSDir to notice that the HS might be gone? And we can't rely on RendPostPeriod value since it's service side. So an operator could litterally have set that to 7 hours meaning we might not see any new revision counter for that period and still unable to tell if the HS is gone or not.
This is why our best bet is to compute a "maximum crazy time" that descriptor could be valid.
An other option is to add a valid-until field in the cleartext part of the descriptor and the HSDir could use that to expire entries plus a clock skew delta.
So if HSDirs always keep descriptors for 36 hours, what happens if an HS rotates its intro points and publishes a new descriptor just an hour before the time period changes? Then the HSDir needs to keep and serve that descriptor for 36 hours, even if it will expire in an hour? Is this OK?
Yup basically. That's a bit of a downside of what we have right now also. Uploading at the last hour with new content makes it that we'll expire the descriptor in 48h...
Could this be another point for doing the valid-until thing?
We should think of what does that leak to the network? It would leak the HS clock time (well partially, we can always add some random to it).
I don't think it would leak *when* the HS might probably create new IP connections since the overlap period makes it that when that valid-until time is reached, new IPs have been established for a while already. However, it would leak the teardown of the old ones. For a guard, that is useful information to know _when_ an HS will do certain specific network actions. (here killing 3 circuits).
Currently the proposal says:
Hidden service directories should accept descriptors [...] and retain them for at least [TODO: how much?] minutes after the end of the period.
but that means that HSDirs need to keep track of when the period ends, and whether a descriptor was uploaded for the current time period or for the overlay period...
I think this will make things much more complicated. IMO, the HSDir should _only_ rely on the revision counter and an expiry time and not trying to try and guess the lifetime of a descriptor from the service perspective.
However, here is an idea. Considering teor's argument about HS fingerprinting, we should make the upload happen regurlarly so having RendPostPeriod customizable by an operator is probably a bad idea. We kind of need that _all_ HS expect to behave the same in normal circumstances with their HS desc uploads. Thus, if the RendPostPeriod would become hardcoded (or consensus params), the HSDir would know that the HS is "gone" or have rotated HSDirs after let's say 4 * RendPostPeriod (4 here is abritrary) of not seeing a new revision counter. This makes a cache entry lifetime much smaller!
In the end, I see either we use an expiry time that is the _maximum_ lifetime a descriptor can have or an expiry time that is based on the expected maximum time frame we should have received a new revision counter. The latter is fun because we don't need to consider client clock skew since if we happen to purge the descriptor from the cache it's because we think the service is gone or has rotated HSDir thus any client coming our way would fail in the first place to connect to the service by getting the descriptor that should have been updated X hours ago.
Also consider that whatever "maximum acceptable clock skew" we choose, the hidden service needs to keep its introduction circuits up for that time as well, otherwise the descriptor will be useless to the clock skewed clients.
Yup! This is why I think above 6 hours of clock skewed you won't do much as a client... maybe even less!
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services. I see the following two negatives here:
- Hidden services need to retain their old intro circuits for the duration of the acceptable clock skew.
I pretty sure we don't do that currently. However, we could start doing that and collect stats on how frequent it is and with how much skew! That would be a very useful information to have imo.
Could this somehow cause #16702? Probably not...
Hrm... I think it's more an issue of a race between changing IPs and inflight client rather than a client being confused about the time to arrive to the party. But who knows!
Cheers! David
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 12 Apr (16:01:32), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote:
Hello,
during March we discussed the cell formats of prop224: https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html
The prop224 topic for this month has to do with the way descriptors get uploaded and downloaded, how this is scheduled using time periods and how the shared randomness subsystem interacts with all that.
<snip>
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
- TIME_PERIOD_ROTATION_TIME (currently 12:00)
[Second email with some more thoughts]
BTW, currently in prop224 the TIME_PERIOD_ROTATION_TIME is at 00:00 because of the following paragraph:
Time periods start with the Unix epoch (Jan 1, 1970), and are computed by taking the number of whole minutes since the epoch and dividing by the time period. So if the current time is 2013-11-12 13:44:32 UTC, making the seconds since the epoch 1384281872, the number of minutes since the epoch is 23071364. If the current time period length is 1500 (the default), then the current time period number is 15380. It began 15380*1500*60 seconds after the epoch at 2013-11-11 20:00:00 UTC, and will end at (15380+1)*1500*60 seconds after the epoch at 2013-11-12 21:00:00 UTC.
I wonder what's the best way to change this to start at 12:00.
We could in theory compute the "number of whole minutes since the epoch plus 12 hours" and use that in the division, but that would be a bit ugly... Is there a more elegant thing to do?
We could also in theory change the shared random value generation to happen at 12:00, and then have TIME_PERIOD_ROTATION_TIME naturally start at 00:00, but this requires changing prop250. Could it be worth it? :/
I think it's fine we keep the start time at 12:00 here. It's just an offset from the start of the epoch. Furthermore, adding a "rotation time" makes it that we we can control where everything started which doesn't have to be the epoch time at 00:00.
We can find the start of the TP with those two (rotation time and lifetime) and then divide that time value by the lifetime to get the nth time period.
Also, controling the rotation time is good to have for chutney testing with much more smaller timings.
OK, I posted a torspec branch with some initial changes based on the discussions of this thread at 'prop224-timeperiods-1': https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224-timeperiod...
Specifically, wrt time periods and the start time, I introduced a "rotation time offset" of 12 hours to the epoch calculation. You can see it here: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224-timeper...
Please let me know if the technique can be simplified or improved or needs better wording. We need this mechanic to be clear and easy to understand and implement!
<snip>
Currently the proposal says:
Hidden service directories should accept descriptors [...] and retain them for at least [TODO: how much?] minutes after the end of the period.
but that means that HSDirs need to keep track of when the period ends, and whether a descriptor was uploaded for the current time period or for the overlay period...
I think this will make things much more complicated. IMO, the HSDir should _only_ rely on the revision counter and an expiry time and not trying to try and guess the lifetime of a descriptor from the service perspective.
However, here is an idea. Considering teor's argument about HS fingerprinting, we should make the upload happen regurlarly so having RendPostPeriod customizable by an operator is probably a bad idea. We kind of need that _all_ HS expect to behave the same in normal circumstances with their HS desc uploads. Thus, if the RendPostPeriod would become hardcoded (or consensus params), the HSDir would know that the HS is "gone" or have rotated HSDirs after let's say 4 * RendPostPeriod (4 here is abritrary) of not seeing a new revision counter. This makes a cache entry lifetime much smaller!
In the end, I see either we use an expiry time that is the _maximum_ lifetime a descriptor can have or an expiry time that is based on the expected maximum time frame we should have received a new revision counter. The latter is fun because we don't need to consider client clock skew since if we happen to purge the descriptor from the cache it's because we think the service is gone or has rotated HSDir thus any client coming our way would fail in the first place to connect to the service by getting the descriptor that should have been updated X hours ago.
Here is another important commit that specifies the overlap period functionality: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224-timeper...
It also adds the following section:
+ HSDirs MUST retain hidden service descriptors for 33 hours before expiring + them. That's 24 hours for the time period duration, plus 6 hours for the + maximum overlap period span, plus 3 hours for the maximum acceptable client + clock skew.
+ Hidden services should keep their old introduction circuits open for at + least 3 hours after descriptor expiration, so that clients with skewed + clocks can still visit them through outdated descriptors.
This implements the naive cache lifetime mechanic we discussed in this thread. That's an improvement over the empty TODO section of the current prop224 but maybe we can do better: we should think whether we want to do more advanced HSDir heuristics like "If I'm an HSDir and I don't receive an HS descriptor for N hours, consider that HS dead". Or maybe we should add valid-until fields to hidden service descriptors. Thoughts?
Here are some things left to be done:
- Specify *when* hidden services upload descriptors. Do they do it hourly, or only when a change has happened. Both approaches leak information to the HSDir (the former leaks uptime, the latter leaks intro point changes).
- Specify behavior of hidden services and clients with regards to time periods and the use of SRVs as discussed in https://lists.torproject.org/pipermail/tor-dev/2016-April/010757.html
- Further specify descriptor caching behavior of HSDirs.
Did I forget anything?
On 13 Apr (15:34:54), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 12 Apr (16:01:32), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 04 Apr (19:13:39), George Kadianakis wrote: > Hello, > > during March we discussed the cell formats of prop224: > https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html > > The prop224 topic for this month has to do with the way descriptors get > uploaded and downloaded, how this is scheduled using time periods and how the > shared randomness subsystem interacts with all that. > > <snip> >
The logic I sketched out above makes it that we would need parameters (from the consensus) like so (or hardcode them):
- TIME_PERIOD_ROTATION_TIME (currently 12:00)
[Second email with some more thoughts]
BTW, currently in prop224 the TIME_PERIOD_ROTATION_TIME is at 00:00 because of the following paragraph:
Time periods start with the Unix epoch (Jan 1, 1970), and are computed by taking the number of whole minutes since the epoch and dividing by the time period. So if the current time is 2013-11-12 13:44:32 UTC, making the seconds since the epoch 1384281872, the number of minutes since the epoch is 23071364. If the current time period length is 1500 (the default), then the current time period number is 15380. It began 15380*1500*60 seconds after the epoch at 2013-11-11 20:00:00 UTC, and will end at (15380+1)*1500*60 seconds after the epoch at 2013-11-12 21:00:00 UTC.
I wonder what's the best way to change this to start at 12:00.
We could in theory compute the "number of whole minutes since the epoch plus 12 hours" and use that in the division, but that would be a bit ugly... Is there a more elegant thing to do?
We could also in theory change the shared random value generation to happen at 12:00, and then have TIME_PERIOD_ROTATION_TIME naturally start at 00:00, but this requires changing prop250. Could it be worth it? :/
I think it's fine we keep the start time at 12:00 here. It's just an offset from the start of the epoch. Furthermore, adding a "rotation time" makes it that we we can control where everything started which doesn't have to be the epoch time at 00:00.
We can find the start of the TP with those two (rotation time and lifetime) and then divide that time value by the lifetime to get the nth time period.
Also, controling the rotation time is good to have for chutney testing with much more smaller timings.
OK, I posted a torspec branch with some initial changes based on the discussions of this thread at 'prop224-timeperiods-1': https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224-timeperiod...
Specifically, wrt time periods and the start time, I introduced a "rotation time offset" of 12 hours to the epoch calculation. You can see it here: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224-timeper...
Looks good!
Please let me know if the technique can be simplified or improved or needs better wording. We need this mechanic to be clear and easy to understand and implement!
I find it pretty straight forward. Bottom line, we add an offset to accomodate our time period.
<snip>
Currently the proposal says:
Hidden service directories should accept descriptors [...] and retain them for at least [TODO: how much?] minutes after the end of the period.
but that means that HSDirs need to keep track of when the period ends, and whether a descriptor was uploaded for the current time period or for the overlay period...
I think this will make things much more complicated. IMO, the HSDir should _only_ rely on the revision counter and an expiry time and not trying to try and guess the lifetime of a descriptor from the service perspective.
However, here is an idea. Considering teor's argument about HS fingerprinting, we should make the upload happen regurlarly so having RendPostPeriod customizable by an operator is probably a bad idea. We kind of need that _all_ HS expect to behave the same in normal circumstances with their HS desc uploads. Thus, if the RendPostPeriod would become hardcoded (or consensus params), the HSDir would know that the HS is "gone" or have rotated HSDirs after let's say 4 * RendPostPeriod (4 here is abritrary) of not seeing a new revision counter. This makes a cache entry lifetime much smaller!
In the end, I see either we use an expiry time that is the _maximum_ lifetime a descriptor can have or an expiry time that is based on the expected maximum time frame we should have received a new revision counter. The latter is fun because we don't need to consider client clock skew since if we happen to purge the descriptor from the cache it's because we think the service is gone or has rotated HSDir thus any client coming our way would fail in the first place to connect to the service by getting the descriptor that should have been updated X hours ago.
Here is another important commit that specifies the overlap period functionality: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224-timeper...
It also adds the following section:
HSDirs MUST retain hidden service descriptors for 33 hours before expiring
them. That's 24 hours for the time period duration, plus 6 hours for the
maximum overlap period span, plus 3 hours for the maximum acceptable client
clock skew.
Hidden services should keep their old introduction circuits open for at
least 3 hours after descriptor expiration, so that clients with skewed
clocks can still visit them through outdated descriptors.
This implements the naive cache lifetime mechanic we discussed in this thread. That's an improvement over the empty TODO section of the current prop224 but maybe we can do better: we should think whether we want to do more advanced HSDir heuristics like "If I'm an HSDir and I don't receive an HS descriptor for N hours, consider that HS dead". Or maybe we should add valid-until fields to hidden service descriptors. Thoughts?
I'm good with this. Let's start with a baseline (and also almost same behavior we have now) and we can improve as we go. In my wildest dream, we get the HSDir side merged in 029 so we basically have until Sept. to figure that out.
Here are some things left to be done:
- Specify *when* hidden services upload descriptors. Do they do it hourly, or only when a change has happened. Both approaches leak information to the HSDir (the former leaks uptime, the latter leaks intro point changes).
I'm more comfortable right now with the HS uploading every RendPostPeriod (default: 1 hour). Even if the descriptor content doesn't change, it should increment the revision-counter.
I'm more and more convinced that making RendPostPeriod _not_ configurable is also something we should do and thus allowing us to have the HSDir use that value instead (maybe).
- Specify behavior of hidden services and clients with regards to time periods and the use of SRVs as discussed in https://lists.torproject.org/pipermail/tor-dev/2016-April/010757.html
I'm happy with your approach if you find mine a bit more complicated. They both result in the _same_ behavior anyway. Altough, in terms of code, for each SRV value, we need to keep the valid-after and the valid-until time in our SRV data structure (which we don't right now with current prop250 code).
- Further specify descriptor caching behavior of HSDirs.
Can you elaborate here? Detailing expiry time? Conditions to replace a cache entry? ...?
Did I forget anything?
I think all discussions have been covered! Great work! Thanks!
David
On 15 Apr 2016, at 01:18, David Goulet dgoulet@ev0ke.net wrote:
Here are some things left to be done:
- Specify *when* hidden services upload descriptors. Do they do it hourly, or
only when a change has happened. Both approaches leak information to the HSDir (the former leaks uptime, the latter leaks intro point changes).
I'm more comfortable right now with the HS uploading every RendPostPeriod (default: 1 hour). Even if the descriptor content doesn't change, it should increment the revision-counter.
I'm more and more convinced that making RendPostPeriod _not_ configurable is also something we should do and thus allowing us to have the HSDir use that value instead (maybe).
Alec and others have talked about posting descriptors more frequently to enable rapid onion service fail-over. Perhaps we should have a default, and then allow the period to be changed (like in the current code).
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 13 Apr (15:34:54), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 12 Apr (16:01:32), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
> [ text/plain ] > On 04 Apr (19:13:39), George Kadianakis wrote: >> Hello, >> >> during March we discussed the cell formats of prop224: >> https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html >> >> The prop224 topic for this month has to do with the way descriptors get >> uploaded and downloaded, how this is scheduled using time periods and how the >> shared randomness subsystem interacts with all that. >> >> <snip>
Here are some things left to be done:
- Specify *when* hidden services upload descriptors. Do they do it hourly, or only when a change has happened. Both approaches leak information to the HSDir (the former leaks uptime, the latter leaks intro point changes).
I'm more comfortable right now with the HS uploading every RendPostPeriod (default: 1 hour). Even if the descriptor content doesn't change, it should increment the revision-counter.
I'm more and more convinced that making RendPostPeriod _not_ configurable is also something we should do and thus allowing us to have the HSDir use that value instead (maybe).
Hello,
I pushed some more changes to my `prop224-timeperiods-1` branch.
As discussed above, I specified that HSes should upload their descriptors periodically here: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224-timeper...
- Specify behavior of hidden services and clients with regards to time periods and the use of SRVs as discussed in https://lists.torproject.org/pipermail/tor-dev/2016-April/010757.html
I'm happy with your approach if you find mine a bit more complicated. They both result in the _same_ behavior anyway. Altough, in terms of code, for each SRV value, we need to keep the valid-after and the valid-until time in our SRV data structure (which we don't right now with current prop250 code).
I also specified the behavior of hidden services and clients in this commit: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224-timeper...
Please let me know how that section can become cleaner if you have any ideas.
I think this covers up the time period related changes for now. If you people think that my `prop224-timeperiods-1` branch looks good and that I didn't forget of anything else, I will ask Nick to review it and then merge it to torspec.
Cheers!
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 13 Apr (15:34:54), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 12 Apr (16:01:32), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 11 Apr (14:42:02), George Kadianakis wrote: > David Goulet dgoulet@ev0ke.net writes: > > > [ text/plain ] > > On 04 Apr (19:13:39), George Kadianakis wrote: > >> Hello, > >> > >> during March we discussed the cell formats of prop224: > >> https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html > >> > >> The prop224 topic for this month has to do with the way descriptors get > >> uploaded and downloaded, how this is scheduled using time periods and how the > >> shared randomness subsystem interacts with all that. > >> > >> <snip>
Here are some things left to be done:
- Specify *when* hidden services upload descriptors. Do they do it hourly, or only when a change has happened. Both approaches leak information to the HSDir (the former leaks uptime, the latter leaks intro point changes).
I'm more comfortable right now with the HS uploading every RendPostPeriod (default: 1 hour). Even if the descriptor content doesn't change, it should increment the revision-counter.
I'm more and more convinced that making RendPostPeriod _not_ configurable is also something we should do and thus allowing us to have the HSDir use that value instead (maybe).
Hello,
I pushed some more changes to my `prop224-timeperiods-1` branch.
Merged the `prop224-timeperiods-1` branch to torspec master.
Thanks for the review and comments!
On 11/04/16 12:42, George Kadianakis wrote:
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services.
Sorry for the late reply. I just wanted to mention that mobile clients often have very skewed clocks because some users adjust the time rather than the timezone when travelling. Disapprove if you like, I share your disapproval, but that's what the kids are doing these days.
Also, if the hidden service and the client are both running on mobile devices, the relative skew can be twice as much.
Cheers, Michael
On 22 Apr 2016, at 00:46, Michael Rogers michael@briarproject.org wrote:
On 11/04/16 12:42, George Kadianakis wrote:
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services.
Sorry for the late reply. I just wanted to mention that mobile clients often have very skewed clocks because some users adjust the time rather than the timezone when travelling. Disapprove if you like, I share your disapproval, but that's what the kids are doing these days.
Also, if the hidden service and the client are both running on mobile devices, the relative skew can be twice as much.
On 13 Apr (15:34:54), George Kadianakis wrote:
It also adds the following section:
HSDirs MUST retain hidden service descriptors for 33 hours before expiring
them. That's 24 hours for the time period duration, plus 6 hours for the
maximum overlap period span, plus 3 hours for the maximum acceptable client
clock skew.
Hidden services should keep their old introduction circuits open for at
least 3 hours after descriptor expiration, so that clients with skewed
clocks can still visit them through outdated descriptors.
This implements the naive cache lifetime mechanic we discussed in this thread. That's an improvement over the empty TODO section of the current prop224 but maybe we can do better: we should think whether we want to do more advanced HSDir heuristics like "If I'm an HSDir and I don't receive an HS descriptor for N hours, consider that HS dead". Or maybe we should add valid-until fields to hidden service descriptors. Thoughts?
On 21 Apr 2016, at 19:55, George Kadianakis desnacked@riseup.net wrote:
Before doing this we should make sure that nothing will break if we bring the 72 hours v2 lifetime down to 33 hours. Personally, I can't think of anything breaking right now.
The proposal is to set: REND_CACHE_MAX_SKEW to 3 hours (down from 24 hours) REND_CACHE_MAX_AGE to 30 hours (down from 48 hours) (24 hours for the period, and 6 hours for the maximum overlap)
The maximum time zone difference is 26 hours.[0] So I think we should make REND_CACHE_MAX_SKEW 28 hours to accomodate users who change their clock, rather than their timezone. (This allows for an additional timezone change East or West, or a user off-by-one-hour. However, it doesn't allow for the user getting the day wrong in the wrong direction. I think this is ok.)
Regardless of client clock issues, the directory authorities only drop relays from the consensus when their skew reaches 12 hours in the future, or 24 hours in the past. This is an argument for making the REND_CACHE_MAX_SKEW at least 24 hours, to accomodate HSDirs with bad clocks. Otherwise, a HSDir with a clock skewed 3 hours into the past would start rejecting valid descriptors.
It's also worth noting that HSDirs believe the date in the signed descriptor, even if it's badly skewed. This means a forward-dated descriptor is preferred under memory pressure, which is a property I don't like. I suggest we evict cache entries when either: * they have spent a lot of time in the cache since the time they were received, or * their signed date is old.
Other factors:
My understanding is that the consensus can be downloaded before the clock is changed, and successfully used afterwards. (At least if the skew is less than 24 hours in the past or 27 hours in the future.)
But I haven't looked into the details of whether a client could connect to a hidden service with a skew this high. How much skew clients and hidden services can have before TLS or Tor-specific crypto fails? Does anyone want to spin up a VM and work this out?
In the interim, let's assume the crypto will work, and modify the proposal with a larger clock skew.
Tim
[0]: https://en.wikipedia.org/wiki/List_of_UTC_time_offsets
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
On 28 Apr (13:24:32), Tim Wilson-Brown - teor wrote:
On 22 Apr 2016, at 00:46, Michael Rogers michael@briarproject.org wrote:
On 11/04/16 12:42, George Kadianakis wrote:
FWIW, I'm personally not sure how to choose the best "maximum acceptable clock skew" value here. My intuition tells me to choose a big number so that even very skewed clients can visit hidden services.
Sorry for the late reply. I just wanted to mention that mobile clients often have very skewed clocks because some users adjust the time rather than the timezone when travelling. Disapprove if you like, I share your disapproval, but that's what the kids are doing these days.
Also, if the hidden service and the client are both running on mobile devices, the relative skew can be twice as much.
On 13 Apr (15:34:54), George Kadianakis wrote:
It also adds the following section:
HSDirs MUST retain hidden service descriptors for 33 hours before expiring
them. That's 24 hours for the time period duration, plus 6 hours for the
maximum overlap period span, plus 3 hours for the maximum acceptable client
clock skew.
Hidden services should keep their old introduction circuits open for at
least 3 hours after descriptor expiration, so that clients with skewed
clocks can still visit them through outdated descriptors.
This implements the naive cache lifetime mechanic we discussed in this thread. That's an improvement over the empty TODO section of the current prop224 but maybe we can do better: we should think whether we want to do more advanced HSDir heuristics like "If I'm an HSDir and I don't receive an HS descriptor for N hours, consider that HS dead". Or maybe we should add valid-until fields to hidden service descriptors. Thoughts?
On 21 Apr 2016, at 19:55, George Kadianakis desnacked@riseup.net wrote:
Before doing this we should make sure that nothing will break if we bring the 72 hours v2 lifetime down to 33 hours. Personally, I can't think of anything breaking right now.
The proposal is to set: REND_CACHE_MAX_SKEW to 3 hours (down from 24 hours) REND_CACHE_MAX_AGE to 30 hours (down from 48 hours) (24 hours for the period, and 6 hours for the maximum overlap)
The maximum time zone difference is 26 hours.[0] So I think we should make REND_CACHE_MAX_SKEW 28 hours to accomodate users who change their clock, rather than their timezone. (This allows for an additional timezone change East or West, or a user off-by-one-hour. However, it doesn't allow for the user getting the day wrong in the wrong direction. I think this is ok.)
Regardless of client clock issues, the directory authorities only drop relays from the consensus when their skew reaches 12 hours in the future, or 24 hours in the past. This is an argument for making the REND_CACHE_MAX_SKEW at least 24 hours, to accomodate HSDirs with bad clocks. Otherwise, a HSDir with a clock skewed 3 hours into the past would start rejecting valid descriptors.
It's also worth noting that HSDirs believe the date in the signed descriptor, even if it's badly skewed. This means a forward-dated descriptor is preferred under memory pressure, which is a property I don't like.
HSDir in 224 don't care anymore about any timestamp. Actually, no timestamp is present in the descriptor anymore, it's only about the revision-counter.
This clock skew is useful on the service that is keeping the intro point opens for an extra amount of time for skewed clients. Thus the cache entry adds a clock skew delta to its lifetime for those clients for which we also assume the service will still hold the intro points for that time also.
But all this is not an exact science since a service can upload a new descriptor an hour before it rotates its keys then keep the old intro points open for the 3 hours clock skew but then the descriptor will be in the HSDir cache for 30+ hours so a very confused client with an old shared random value will fail anyway...
(Heck, I'm not even sure we keep intro point open to accomodate the clock skew on the service side right now.)
The best argument I heard for having a clock skew delta is because of mobile where clients often can have a big skewed clock (number unknown...)
David
I suggest we evict cache entries when either:
- they have spent a lot of time in the cache since the time they were received, or
- their signed date is old.
Other factors:
My understanding is that the consensus can be downloaded before the clock is changed, and successfully used afterwards. (At least if the skew is less than 24 hours in the past or 27 hours in the future.)
But I haven't looked into the details of whether a client could connect to a hidden service with a skew this high. How much skew clients and hidden services can have before TLS or Tor-specific crypto fails? Does anyone want to spin up a VM and work this out?
In the interim, let's assume the crypto will work, and modify the proposal with a larger clock skew.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hello people,
I invite you to check out another round of time period-related prop224 spec changes, based on our discussions in Montreal. These new changes simplify the overlap descriptor publishing logic, and improve the caching lifetime of descriptors in HSDirs.
You can find them in my branch `prop224-montreal-timeperiods` or here: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224-montreal-t...
The main issue for me right now is that I can't recall how this helps with clock skewed clients, even though that was a big part of our discussion in Montreal.
Specifically, I think that clients (and HSes) should determine the set of responsible HSDirs (i.e. the current time period) based on the "valid-after" of their latest consensus, instead of using their local clock. This way, as long as the client's skewed clock is good enough to verify the latest consensus, the client will have a consistent view of the network and SRV (assuming an honest/updated dirguard). I tried to clarify this a bit in commit 465156d, so please let me know if it's not a good idea.
Am I missing something wrt clock skewed clients here? If yes, can someone demonstrate the effects of these changes with an example, so that I can clarify the proposal further?
Feedback is welcome! If I receive positive feedback, I will merge this in torspec.git ASAP.
Thanks!
On Mon, Jun 13, 2016 at 03:48:39PM +0300, George Kadianakis wrote:
The main issue for me right now is that I can't recall how this helps with clock skewed clients, even though that was a big part of our discussion in Montreal.
Specifically, I think that clients (and HSes) should determine the set of responsible HSDirs (i.e. the current time period) based on the "valid-after" of their latest consensus, instead of using their local clock. This way, as long as the client's skewed clock is good enough to verify the latest consensus, the client will have a consistent view of the network and SRV (assuming an honest/updated dirguard). I tried to clarify this a bit in commit 465156d, so please let me know if it's not a good idea.
Interesting idea! I think I like it. You're right that in Montreal we were thinking in terms of client clocks, and we might be able to reduce the problem (both in frequency and in magnitude) by considering the time in the last consensus we have.
Another argument in favor of using the last consensus is that we will be picking the "relays that are closest to the right location in the hash ring" out of our last consensus already. (That is not a strong argument in favor though, I think, since in theory there won't be so much churn in a day that all of the relays in our last consensus will become wrong.)
All of this said, it seems like you are basing your arguments on some expectations about how clients handle consensuses that have surprising dates in them (surprising either because the client's clock is skewed, or because their directory guard gave them the wrong consensus). How *do* clients handle these situations? If we could get the intended / expected behavior written down, then we would have a better chance of identifying bugs in it that we can then fix.
For example, do I as a client just ignore and discard a consensus from 6 hours in the future? I don't remember the answer, so I can't do a good job at analyzing your proposed change.
Thanks! --Roger
Roger Dingledine arma@mit.edu writes:
[ text/plain ] On Mon, Jun 13, 2016 at 03:48:39PM +0300, George Kadianakis wrote:
The main issue for me right now is that I can't recall how this helps with clock skewed clients, even though that was a big part of our discussion in Montreal.
Specifically, I think that clients (and HSes) should determine the set of responsible HSDirs (i.e. the current time period) based on the "valid-after" of their latest consensus, instead of using their local clock. This way, as long as the client's skewed clock is good enough to verify the latest consensus, the client will have a consistent view of the network and SRV (assuming an honest/updated dirguard). I tried to clarify this a bit in commit 465156d, so please let me know if it's not a good idea.
Interesting idea! I think I like it. You're right that in Montreal we were thinking in terms of client clocks, and we might be able to reduce the problem (both in frequency and in magnitude) by considering the time in the last consensus we have.
Another argument in favor of using the last consensus is that we will be picking the "relays that are closest to the right location in the hash ring" out of our last consensus already. (That is not a strong argument in favor though, I think, since in theory there won't be so much churn in a day that all of the relays in our last consensus will become wrong.)
All of this said, it seems like you are basing your arguments on some expectations about how clients handle consensuses that have surprising dates in them (surprising either because the client's clock is skewed, or because their directory guard gave them the wrong consensus). How *do* clients handle these situations? If we could get the intended / expected behavior written down, then we would have a better chance of identifying bugs in it that we can then fix.
I agree that we should get the intended/expected behavior written down!
Here is an initial attempt at figuring out the current Tor behavior when handling consensuses with surprising dates. More work is required here.
For example, do I as a client just ignore and discard a consensus from 6 hours in the future? I don't remember the answer, so I can't do a good job at analyzing your proposed change.
In general, the relevant time checks seem to happen at networkstatus_get_reasonably_live_consensus() and not during consensus parsing. That function is then called by router_have_minimum_dir_info() during bootstrapping. If that function returns NULL, then Tor will get stuck at "Boostrapping 25%: Loading networkstatus consensus".
Here is the basic logic of networkstatus_get_reasonably_live_consensus():
------------------------------------------------------ #define REASONABLY_LIVE_TIME (24*60*60) if (consensus && consensus->valid_after <= now && now <= consensus->valid_until+REASONABLY_LIVE_TIME) return consensus; else return NULL; ------------------------------------------------------
And here are the scenarios:
Case #1: Handling consensuses with old dates
If a client receives a consensus with an old date (i.e. the client's clock is skewed forward), the consensus will get verified just fine and Tor won't even log about the skew (XXX maybe we should fix this?)
However when networkstatus_get_reasonably_live_consensus() gets reached, Tor will refuse to handle any consensuses whose valid_until date has expired by more than 24 hours.
Case #2: Handling consensuses with future dates
If a client receives a consensus with a valid_after in the future (i.e. the client's clock is skewed backwards), the consensus will get verified fine and a log will appear about the skew ("Our clock is N hours behind the time published in the consensus yada yada...")
However, when networkstatus_get_reasonably_live_consensus() gets reached, Tor will refuse to handle any consensuses whose valid_after date is in the future.
We see that while Tor consensus handling is quite flexible towards forward skewed clocks (case #1), it's actually quite strict towards backward skewed clocks (case #2). We might want to rethink how this should work, if we are serious about supporting clock skewed clients. After all, handling consensuses with future dates is safer than handling consensuses with older dates (which are replayable).
I also wonder if we can consider the above problem orthogonal wrt prop224. After all the problem here is on the consensus handling layer, and affects all current clients and not just HS clients. We should first figure out exactly how well the current Tor behavior works with the suggested prop224 changes.
BTW, the analysis above does not consider situations where the dirguard gives us the wrong consensus (by caching accident or malice), or when the clock gets skewed in the middle of Tor's runtime. Or any other weird scenarios I didn't think about.
I will try to think more about this RSN. Till then, feedback is welcome :)
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Roger Dingledine arma@mit.edu writes:
[ text/plain ] On Mon, Jun 13, 2016 at 03:48:39PM +0300, George Kadianakis wrote:
The main issue for me right now is that I can't recall how this helps with clock skewed clients, even though that was a big part of our discussion in Montreal.
Specifically, I think that clients (and HSes) should determine the set of responsible HSDirs (i.e. the current time period) based on the "valid-after" of their latest consensus, instead of using their local clock. This way, as long as the client's skewed clock is good enough to verify the latest consensus, the client will have a consistent view of the network and SRV (assuming an honest/updated dirguard). I tried to clarify this a bit in commit 465156d, so please let me know if it's not a good idea.
Interesting idea! I think I like it. You're right that in Montreal we were thinking in terms of client clocks, and we might be able to reduce the problem (both in frequency and in magnitude) by considering the time in the last consensus we have.
Another argument in favor of using the last consensus is that we will be picking the "relays that are closest to the right location in the hash ring" out of our last consensus already. (That is not a strong argument in favor though, I think, since in theory there won't be so much churn in a day that all of the relays in our last consensus will become wrong.)
All of this said, it seems like you are basing your arguments on some expectations about how clients handle consensuses that have surprising dates in them (surprising either because the client's clock is skewed, or because their directory guard gave them the wrong consensus). How *do* clients handle these situations? If we could get the intended / expected behavior written down, then we would have a better chance of identifying bugs in it that we can then fix.
I agree that we should get the intended/expected behavior written down!
A few days have passed and I still feel that using the latest consensus valid_after time is a more robust way for taking decisions on how to perform the HS protocol, than using the local client clock. After all, the whole Tor protocol relies on having a good consensus (for HSDirs, SRV, etc.), so you can't go very far with a bad consensus anyway.
On the subject of clock skewed clients, I opened ticket #19460 with a few suggestions for improving the handling of consensuses with surprising dates. In general, I feel that with the #19460 suggestion implemented, the system can be accomodating towards slightly clock skewed clients both in the forward and backwards directions.
Also, we have logs in place to warn people that their clocks are ultra skewed (based on the received consensus date). We also have mechanisms in place to ensure that we refetch a consensus when the current consensus date is too far off (see update_consensus_networkstatus_downloads()). Now whether all these mechanisms and logs work properly in all cases is something we need to test extensively.
Of course, using the consensus valid_after time is not bulletproof either: there are various edge cases where this can have bad results. For example, imagine a world where the real time is 07:00UTC, but Alice is a 10 hours backwards-skewed client whose local time is 21:00UTC. Imagine that Alice starts up Tor with an old consensus with valid_after 20:00UTC (because her dirguard lied, or because Alice had that consensus cached). In this case, Alice will not realize that the consensus is hella old, and will try to use it. She will then compute the wrong set of HSDirs, and fail the HS protocol. This case is plausible in theory but also quite hard to protect against, since both Alice and her consensus had wrong but convenient times.
All in all, I feel that using the consensus valid_after time for time period related calculations seems reasonable at this point, but we should do more testing (ideally automated) as we implement the relevant parts.
Here is an initial attempt at figuring out the current Tor behavior when handling consensuses with surprising dates. More work is required here.
For example, do I as a client just ignore and discard a consensus from 6 hours in the future? I don't remember the answer, so I can't do a good job at analyzing your proposed change.
In general, the relevant time checks seem to happen at networkstatus_get_reasonably_live_consensus() and not during consensus parsing. That function is then called by router_have_minimum_dir_info() during bootstrapping. If that function returns NULL, then Tor will get stuck at "Boostrapping 25%: Loading networkstatus consensus".
Here is the basic logic of networkstatus_get_reasonably_live_consensus():
#define REASONABLY_LIVE_TIME (24*60*60) if (consensus && consensus->valid_after <= now && now <= consensus->valid_until+REASONABLY_LIVE_TIME) return consensus; else return NULL;
And here are the scenarios:
Case #1: Handling consensuses with old dates
If a client receives a consensus with an old date (i.e. the client's clock is skewed forward), the consensus will get verified just fine and Tor won't even log about the skew (XXX maybe we should fix this?) However when networkstatus_get_reasonably_live_consensus() gets reached, Tor will refuse to handle any consensuses whose valid_until date has expired by more than 24 hours.
Case #2: Handling consensuses with future dates
If a client receives a consensus with a valid_after in the future (i.e. the client's clock is skewed backwards), the consensus will get verified fine and a log will appear about the skew ("Our clock is N hours behind the time published in the consensus yada yada...") However, when networkstatus_get_reasonably_live_consensus() gets reached, Tor will refuse to handle any consensuses whose valid_after date is in the future.
We see that while Tor consensus handling is quite flexible towards forward skewed clocks (case #1), it's actually quite strict towards backward skewed clocks (case #2). We might want to rethink how this should work, if we are serious about supporting clock skewed clients. After all, handling consensuses with future dates is safer than handling consensuses with older dates (which are replayable).
I also wonder if we can consider the above problem orthogonal wrt prop224. After all the problem here is on the consensus handling layer, and affects all current clients and not just HS clients. We should first figure out exactly how well the current Tor behavior works with the suggested prop224 changes.
BTW, the analysis above does not consider situations where the dirguard gives us the wrong consensus (by caching accident or malice), or when the clock gets skewed in the middle of Tor's runtime. Or any other weird scenarios I didn't think about.
I will try to think more about this RSN. Till then, feedback is welcome :) _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 13 Jun (15:48:39), George Kadianakis wrote:
Hello people,
I invite you to check out another round of time period-related prop224 spec changes, based on our discussions in Montreal. These new changes simplify the overlap descriptor publishing logic, and improve the caching lifetime of descriptors in HSDirs.
You can find them in my branch `prop224-montreal-timeperiods` or here: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224-montreal-t...
Couple of things.
Section 2.2.2., about this TODO:
[TODO: Control republish period using a consensus parameter?]
Right now, we have RendPostPeriod for such a thing and some random added to it. As we discussed, a service changing that value will make it different from all others and thus more noticeable. But, we cleared out some uses cases where it could be useful such as a service load balancing and republishing a new descriptor often to change its intro points or keys.
Making this a consensus params is a good idea imo but we should also provide an option to override it. Maybe it could make sense to _only_ have the option to change it if you are a NonAnonymous service for instance?
Section 2.2.2.1:
[TODO: What to do when we run multiple hidden services in a single host?]
This could be quite "obvious" at the Guard. Building at least 12 short live circuits is a give away here that I'm running HSes. Apart from adding some random offset for each HS (which even then...), I'm not sure how to address this. Even now, we just upload all decriptors at the same RendPostPeriod. Maybe it's not too big of a problem?
Section 2.2.5:
"Hidden services MUST also keep their introduction circuits alive..."
Does that mean service keeps them open (and the keys) until the descriptor expires on the HSDir? That is a service uploads a desc at 23:00 but then at 00:00 it creates a new descriptor using the new SRV so it should keep the intro points open until 02:00 (23:00 + 3 hours lifetime)?
(In this example, I assume that 00:00 is the start of the overlap period so the previous SRV is discarded in favor of the "old current" and "current".)
The main issue for me right now is that I can't recall how this helps with clock skewed clients, even though that was a big part of our discussion in Montreal.
Specifically, I think that clients (and HSes) should determine the set of responsible HSDirs (i.e. the current time period) based on the "valid-after" of their latest consensus, instead of using their local clock. This way, as long as the client's skewed clock is good enough to verify the latest consensus, the client will have a consistent view of the network and SRV (assuming an honest/updated dirguard). I tried to clarify this a bit in commit 465156d, so please let me know if it's not a good idea.
Yes I agree. The service gets into "overlap mode" as soon as it can which is getting a consensus with valid-after at 00:00+. The as soon as possible is important because it's a best effort to be available using its knowledge (here the consensus, not local clock).
As for the client, in the mobile world for instance, clock are often quite off where the concensus valid-after has to be "accurate". Chances are that clients and services will have a "similar" consensus (or not that far off from each other, like 03:00 for client and 04:00 for service) thus using that time makes more sense than the local clock. Chances are that client and service have a higher chance to be much further away using the local clock. (I do not have strong evidence of this but it's my intuition.)
So, I agree here that client should use valid-after instead of their local clock. And if they can't validate the consensus time, well #yolo anyway.
Cheers! David
Am I missing something wrt clock skewed clients here? If yes, can someone demonstrate the effects of these changes with an example, so that I can clarify the proposal further?
Feedback is welcome! If I receive positive feedback, I will merge this in torspec.git ASAP.
Thanks!
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 13 Jun (15:48:39), George Kadianakis wrote:
Hello people,
I invite you to check out another round of time period-related prop224 spec changes, based on our discussions in Montreal. These new changes simplify the overlap descriptor publishing logic, and improve the caching lifetime of descriptors in HSDirs.
You can find them in my branch `prop224-montreal-timeperiods` or here: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224-montreal-t...
Couple of things.
Section 2.2.2., about this TODO:
[TODO: Control republish period using a consensus parameter?]
Right now, we have RendPostPeriod for such a thing and some random added to it. As we discussed, a service changing that value will make it different from all others and thus more noticeable. But, we cleared out some uses cases where it could be useful such as a service load balancing and republishing a new descriptor often to change its intro points or keys.
I think HSes rotating intro points or keys should publish a new descriptor regardless of the value of RendPostPeriod. This is not mentioned in prop224 tbh (maybe it should), but this is also what little-t-tor does currently (it marks the descriptor as dirty when rotating intro points).
Making this a consensus params is a good idea imo but we should also provide an option to override it. Maybe it could make sense to _only_ have the option to change it if you are a NonAnonymous service for instance?
Section 2.2.2.1:
[TODO: What to do when we run multiple hidden services in a single host?]
This could be quite "obvious" at the Guard. Building at least 12 short live circuits is a give away here that I'm running HSes. Apart from adding some random offset for each HS (which even then...), I'm not sure how to address this. Even now, we just upload all decriptors at the same RendPostPeriod. Maybe it's not too big of a problem?
Indeed, I'm also unsure on how to handle this properly.
Section 2.2.5:
"Hidden services MUST also keep their introduction circuits alive..."
Does that mean service keeps them open (and the keys) until the descriptor expires on the HSDir? That is a service uploads a desc at 23:00 but then at 00:00 it creates a new descriptor using the new SRV so it should keep the intro points open until 02:00 (23:00 + 3 hours lifetime)?
Yes, that's what I mean. I will try to add an example to the proposal to make it more clear.
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Hello people,
I invite you to check out another round of time period-related prop224 spec changes, based on our discussions in Montreal. These new changes simplify the overlap descriptor publishing logic, and improve the caching lifetime of descriptors in HSDirs.
You can find them in my branch `prop224-montreal-timeperiods` or here: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224-montreal-t...
Hello,
I just merged that branch to torspec.git . It's still not late for feedback though. If you find something rotten on the changes, please reply to this thread and we can revise it.