Hello everyone,
Last week I introduced myself [0] on this list, shortly after being accepted into GSoC to work on Consensus Diffs. My GSoC proposal is heavily based on the Tor proposal #140 [1], which is close to being six years old now.
This is why, after some discussion with Nick, Sebastian and Weasel (the original author of the proposal), it became obvious that it needs some revising. Here are the improvements we discussed on IRC:
* Microdescriptors didn't exist back then, so the proposal makes no mention of microdescriptor consensus diffs. We should support these too.
* Weasel added the 's' command just so that a line with just one dot could be produced. Since consensuses should never have such a line, I think it would be best to drop 's' and not support such a line as an input when generating the diffs.
* In response to my introduction mail, Ian mentioned that fetching a diff is leaking data about when you last used Tor. Nick proposed to have a time limit on these diffs, to e.g. 24 or 36 hours, which should mitigate this problem.
* Regarding their size, #140 suggests that they are not useful past 16 hours. I thought we could compare the compressed size of the diffs when creating them, since they may be of use for a longer time. We could do this relative size limit first as well as the time limit mentioned above.
That is all that we came up with for now, what do you think? Ideas about what might be missing or needing an update are welcome, of course :)
Regards.
[0] https://lists.torproject.org/pipermail/tor-dev/2014-April/006744.html [1] https://gitweb.torproject.org/torspec.git/blob_plain/refs/heads/master:/prop...
On Thu, May 01, 2014 at 10:02:30AM +0200, Daniel Martí wrote:
- Regarding their size, #140 suggests that they are not useful past 16 hours. I thought we could compare the compressed size of the diffs when creating them, since they may be of use for a longer time. We could do this relative size limit first as well as the time limit mentioned above.
That is all that we came up with for now, what do you think? Ideas about what might be missing or needing an update are welcome, of course :)
Hypothesis: a good chunk of the new lines are lines that you've already seen in some recent consensus. That is, they're relays that lost the Running flag but now they have it again.
Step zero, see to what extent this theory is true in practice.
If it's true a lot, is there something clients can do to take advantage of this fact? Like, keep several recent consensuses so they already have the lines they're told to add back in? Or do the numbers work out poorly here, since the bytes we spend saying "yes, those lines from back then, the ones that hash to H" eat too much into our savings to warrant the extra complexity?
--Roger
On Thu, May 1, 2014 at 4:17 AM, Roger Dingledine arma@mit.edu wrote:
On Thu, May 01, 2014 at 10:02:30AM +0200, Daniel Martí wrote:
- Regarding their size, #140 suggests that they are not useful past 16 hours. I thought we could compare the compressed size of the diffs when creating them, since they may be of use for a longer time. We could do this relative size limit first as well as the time limit mentioned above.
That is all that we came up with for now, what do you think? Ideas about what might be missing or needing an update are welcome, of course :)
Hypothesis: a good chunk of the new lines are lines that you've already seen in some recent consensus. That is, they're relays that lost the Running flag but now they have it again.
Step zero, see to what extent this theory is true in practice.
If it's true a lot, is there something clients can do to take advantage of this fact? Like, keep several recent consensuses so they already have the lines they're told to add back in? Or do the numbers work out poorly here, since the bytes we spend saying "yes, those lines from back then, the ones that hash to H" eat too much into our savings to warrant the extra complexity?
One alternative in this case would be to include non-running relays in the consensus. This would make each individual consensus longer, but (if your guess is right) might make compressed diffs shorter.
On Thu, May 01, 2014 at 09:27:38 -0400, Nick Mathewson wrote:
One alternative in this case would be to include non-running relays in the consensus. This would make each individual consensus longer, but (if your guess is right) might make compressed diffs shorter.
That's an interesting idea. We could include non-running relays in the consensus for, say, half a day. I don't know how often do relays temporarily stop running, or how long do they generally do it for - I shall investigate.
On Thu, May 1, 2014 at 4:02 AM, Daniel Martí mvdan@mvdan.cc wrote:
Hello everyone,
Last week I introduced myself [0] on this list, shortly after being accepted into GSoC to work on Consensus Diffs. My GSoC proposal is heavily based on the Tor proposal #140 [1], which is close to being six years old now.
This is why, after some discussion with Nick, Sebastian and Weasel (the original author of the proposal), it became obvious that it needs some revising. Here are the improvements we discussed on IRC:
- Microdescriptors didn't exist back then, so the proposal makes no mention of microdescriptor consensus diffs. We should support these too.
I think the only change we'll need for this case is to add URLs for the microdescriptor consensus diffs.
One more thing to clarify:
*
One more idea I thought of:
- What if instead of having a special URL for downloading consensus diffs, we have a special header that tells the directory that the client is willing to accept diffs?
According to the proposal, the client is supposed to ask for the resource with:
HTTP/1.0 GET /tor/status-vote/current/consensus/diff/<HASH>/<FPRLIST>
where HASH is the hash of the descriptor it has, and FPRLIST is a list of fingerprints for authority identity keys.
But what if instead the client is told to do this request with:
HTTP/1.0 GET /tor/status-vote/current/consensus/<FPRLIST>.z X-Or-Diff-From-Consensus: HASH1 HASH2...
where HASH1, HASH2... are digests of one or more consensuses that the client has, and the directory cache is allowed to return a diff from any of those?
This nice thing about this approach is that the client doesn't need to know whether the directory supports consensus diffs. If it does, great: it will send a diff. If not, the directory will ignore the X-Or-Diff-From-Consensus header and just send the consensus as before.
Clever idea? Silly idea?
On Thu, May 01, 2014 at 09:37:12 -0400, Nick Mathewson wrote:
I think the only change we'll need for this case is to add URLs for the microdescriptor consensus diffs.
Cool.
This nice thing about this approach is that the client doesn't need to know whether the directory supports consensus diffs. If it does, great: it will send a diff. If not, the directory will ignore the X-Or-Diff-From-Consensus header and just send the consensus as before.
Clever idea? Silly idea?
I think it's a neat idea - one less thing to worry about on the client side. We don't know yet whether keeping multiple consensuses will be of practical use, but we can always roll back to providing only one available consensus hash.