Hello everyone,
My name is Daniel and this summer I'll be working on consensus diffs [0], heavily based on proposal 140 [1]. This should allow for quicker and scalable consensus udpates, which will have more weight as the consensus grows larger. I've never gotten involved in Tor before, so I'm really looking forward to this summer.
My intention is to use a simplified ed format as described on proposal 140, but I am open to alternatives and suggestions. As far as the diff creation algorithm, I plan on using a dynamic programming algorithm for the Longest Common Substring problem. Like before, comments are very welcome.
I will spend the following couple of weeks looking at alternative diff formats and algorithms, rather than start coding this early. It would be appreciated if any ideas regarding any of the two aspects were posed during this time, so that afterward I can start their implementation.
I will always be lurking on #tor-dev, #tor-project and #tor under the nick 'mvdan'. I am also subscribed to the tor-dev and tor-talk mailing lists. And lastly, my PGP fingerprint is below - encrypted mail is welcome :)
Regards.
[0] https://www.torproject.org/getinvolved/volunteer.html.en#consensusDiffs [1] https://gitweb.torproject.org/torspec.git/blob_plain/refs/heads/master:/prop...
On Tue, Apr 22, 2014 at 05:02:23PM +0200, Daniel Martí wrote:
Hello everyone,
My name is Daniel and this summer I'll be working on consensus diffs [0], heavily based on proposal 140 [1]. This should allow for quicker and scalable consensus udpates, which will have more weight as the consensus grows larger. I've never gotten involved in Tor before, so I'm really looking forward to this summer.
My intention is to use a simplified ed format as described on proposal 140, but I am open to alternatives and suggestions. As far as the diff creation algorithm, I plan on using a dynamic programming algorithm for the Longest Common Substring problem. Like before, comments are very welcome.
The proposal (140) doesn't appear to discuss the client fingerprintability aspect of this: they reveal the last time they used Tor (if recentish). Say you're a mobile client that gets a dynamic IP address. With this, you reveal that you probably aren't or maybe are the same person that was last seen over there at that particular time.
What are the implications here?
On Tue, Apr 22, 2014 at 11:10:27 -0400, Ian Goldberg wrote:
The proposal (140) doesn't appear to discuss the client fingerprintability aspect of this: they reveal the last time they used Tor (if recentish). Say you're a mobile client that gets a dynamic IP address. With this, you reveal that you probably aren't or maybe are the same person that was last seen over there at that particular time.
What are the implications here?
As far as I understand, Tor clients fetch the consensus documents from a random authority at first, and then from caches at somewhat random times - reading from [0] at section 5.1.
Since it starts using caches and building circuits after fetching the first consensus from an authority, I don't see how anyone could identify a client.
Sure, a cache will know for how long has a client been disconnecten when it asks for a diff starting at e.g. yesterday. But was it that same cache who gave it the previous diff? Or are you talking about regular traffic too?
I might have not understood you well - if that's the case, please explain with a bit more of detail.
Anyway, downloading the entire consensus file from either an authority or a cache will always be possible, if that's what you are concerned about. But we want diffs to be usable in a secure manner just like entire consensuses are.
[0] https://gitweb.torproject.org/torspec.git/blob/refs/heads/master:/dir-spec.t...
On Tue, Apr 22, 2014 at 11:10 AM, Ian Goldberg iang@cs.uwaterloo.ca wrote:
On Tue, Apr 22, 2014 at 05:02:23PM +0200, Daniel Martí wrote:
Hello everyone,
My name is Daniel and this summer I'll be working on consensus diffs [0], heavily based on proposal 140 [1]. This should allow for quicker and scalable consensus udpates, which will have more weight as the consensus grows larger. I've never gotten involved in Tor before, so I'm really looking forward to this summer.
My intention is to use a simplified ed format as described on proposal 140, but I am open to alternatives and suggestions. As far as the diff creation algorithm, I plan on using a dynamic programming algorithm for the Longest Common Substring problem. Like before, comments are very welcome.
The proposal (140) doesn't appear to discuss the client fingerprintability aspect of this: they reveal the last time they used Tor (if recentish). Say you're a mobile client that gets a dynamic IP address. With this, you reveal that you probably aren't or maybe are the same person that was last seen over there at that particular time.
If there's a problem here, then it's a problem that already exists because of Tor's use of the If-modified-since header during consensus downloads.
The problem should be less severe in 0.2.4.x, with the addition of directory guards: the source of one's directory info is now one's guards, and they have a pretty good idea of when you were last online anyway. Proposal 236 would narrow the additional value of this information even more. Though we might want to amend proposal 236 so that you get one guard but several directory guards; having a single source for directory information is less robust than having a few. (I'll follow up on another thread about that.)
Additionally, it might be a good idea to close this information flow a bit harder. Our best bet seems to be limiting the age of the consensus that we're willing to ask for a diff from. Any other ideas?