Thanks isis. We worked on PIR-Tor a while ago, so my recollection could be a bit rusty, but here are some thoughts.
1) From a performance perspective, the best results are obtained by using the information-theoretic variant of PIR-Tor (ITPIR) (as opposed to the computational variant). As you correctly point out, we considered the three guard relays to be the PIR directory servers, with the assumption that not all three collude. (Our argument for the non-colluding assumption was that if all three guards are malicious, then the security offered by Tor is terrible already.) Note than ITPIR necessarily requires more than one server. Given Tor's move to a single guard design, the first challenge is to figure out alternative PIR servers for the client.
(I havn't thought much about this challenge, but it seems that the single guard could be one server, and two other relays (maybe with guard flags) could play the role of PIR directory servers. I'll have to think more about this, but it is possible that this design continues to provide similar security properties to Tor)
2) Even outside the context of PIR, there are several advantages to having some structure in the database, with fine grained signatures on individual relay/micro descriptors (relays could also be grouped into blocks, with signatures at the block level). For example, if the Tor network were to use 2 hop circuits, then clients would have close to 0 (or at least significantly less than 1%) overhead for maintaining network state.
Why? Because the first hop (guard) is fixed, fetching the directory to select the guard is only a *one-time* operation; note that much of the directory overhead in Mike Perry's analysis comes from *periodic fetches*. Secondly, clients do not need to use PIR to fetch the exit relay in this setting, since guards know the identity of exits in a two hop design anyway (bandwidth overhead of fetching a single exit is simply the size of the descriptor/micro-descriptor -- resulting in *several orders of magnitude* bandwidth savings). (This optimization was referred to in the paper in the context of fetching middle relays)
(Of course there are anonymity implications of moving to two hop designs. Perhaps the most significant concern is easy guard identification, but perhaps an additional 100 million clients + move to a single guard reduces these concerns. The bandwidth savings could be very significant, specially in the context of mobile clients, which otherwise might use hundreds of MB per month just to maintain network state. Anyway, I digress.)
Please find some comments about your previous mail inline (below).
On Tue, Sep 30, 2014 at 1:04 AM, isis isis@torproject.org wrote:
- The authors assume that a client's three [sic] Guard Nodes will be the directory mirrors for that client, accepting PIR queries for
descriptors.
Right, please see (1) above.
- There is zero handling of colluding directory mirrors.
This seems incorrect? We used the Percy open source library to perform PIR, which does handle collusion among mirrors (to the best of my recollection). The parameters can be set such that one honest server is sufficient for security.
- The *increase* to descriptor size is not well factored in to the
analysis.
4.b. All of the Directory Authorities would need to sign each and every descriptor by itself. This would result in the current microdescriptors, which are sized from roughly 900 bytes to some
of the larger ones which are 2.5 KB, increasing in size by roughly 4 KB (that's the size of the DirAuth signatures on the current consensus).
I am not sure which signature scheme you are considering. In the paper, we talk about threshold BLS signatures, which have only 22 byte overhead. (This might increase for better security guarantees, but 4KB overhead seems very conservative.)
While one of the benefits of PIR would be that clients would no longer need to request the descriptors for all relays listed in the consensus, this benefit actually doesn't help as much as it would seem, because switching to a PIR scheme would require a client to make three of these roughly O(146449)-bit requests, every ten minutes or so, when the client wants a new circuit.
Right, clients would roughly need 18 middle/exit relays (using PIR queries) over a three hour interval to build a Tor circuit every 10 minutes. My recollection is that even in 2011, PIR seemed more efficient (we estimated a 12KB overhead per PIR query) than a full network fetch every three hours (12*18 = 216KB with PIR vs several MBs without PIR). Though these numbers might change depending on implementation details (including the use of micro-descriptors), the benefits would only improve with growing network size.
Thanks, Prateek