Hello,
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
b) Do we want a NACK from the HS for when client authorization fails? Currently the only way for a client to learn that they are not authorized (or that their keys changed or got revoked) is that they never complete the rendezvous.
To achieve that we would need a whole new cell (INTRODUCE_SERVICE_NACK) from the hidden service all the way to the client. Worth it?
c) Another suggestion here by special, is to introduce an additional layer of access control at the HSDir request level, such that HSDirs don't even serve descriptors to clients that do not prove knowledge of a pre-shared fetch key.
This way unauthorized clients cannot even learn presense information of hidden services. This might be quite useful for applications like Ricochet, who want to hide their presense from revoked clients.
Of course this assumes that the HSDir is honest and will honor the fetch key protocol. However, even if the HSDir is dishonest we are just back to the current security level.
We started sketching out a solution in the bottom of these notes: https://people.torproject.org/~asn/hs_notes/client_auth.jpg but the solution is not trivial to implement and we are not sure whether it's worth complicating the protocol further (e.g. we need to design a way for apps like Ricochet to get access to the fetch key).
d) It might be worthwhile padding the encrypted part of INTRODUCE1 to obscure whether client authorization is in place.
As you can see I have mainly worked on point (a) which I consider the most urgent. I welcome feedback on all points, so that we move forward with the design here.
Thanks :)
On 12 Oct 2016, at 07:58, George Kadianakis desnacked@riseup.net wrote:
Hello,
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Double-encryption, because it minimises leakage. But then the size of the descriptor probabilistically leaks client auth anyway, because 330 bytes is not that much smaller than the padding to a multiple of 512 bytes.
So do we need all HSs to include either padding or a client auth section?
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
ASCII? If it's not too hard? Better to make it readable for people who have to parse and debug it.
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
Why not? Re-using IVs is generally bad practice, even if it works in this case.
b) Do we want a NACK from the HS for when client authorization fails? Currently the only way for a client to learn that they are not authorized (or that their keys changed or got revoked) is that they never complete the rendezvous.
To achieve that we would need a whole new cell (INTRODUCE_SERVICE_NACK) from the hidden service all the way to the client. Worth it?
This is better UI for the client, at the cost of leaking some information about why authentication failed.
In general, authentication systems try to avoid providing specific information about why authentication failed. But I'm not sure if that applies in this specific case.
c) Another suggestion here by special, is to introduce an additional layer of access control at the HSDir request level, such that HSDirs don't even serve descriptors to clients that do not prove knowledge of a pre-shared fetch key.
This way unauthorized clients cannot even learn presense information of hidden services. This might be quite useful for applications like Ricochet, who want to hide their presense from revoked clients.
Of course this assumes that the HSDir is honest and will honor the fetch key protocol. However, even if the HSDir is dishonest we are just back to the current security level.
We started sketching out a solution in the bottom of these notes: https://people.torproject.org/~asn/hs_notes/client_auth.jpg but the solution is not trivial to implement and we are not sure whether it's worth complicating the protocol further (e.g. we need to design a way for apps like Ricochet to get access to the fetch key).
Why do we need both a fetch key and a client auth key? Isn't proving that you have a fetch key and a client auth key redundant?
d) It might be worthwhile padding the encrypted part of INTRODUCE1 to obscure whether client authorization is in place.
Yes.
T
-- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------------
Hi,
George Kadianakis wrote: [SNIP]
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
I suggest faking the client-auth-desc-key blob. 330 bytes should be worth it for making both auth non-auth hidden services look very alike (in size). The double-encryption, aside that it might be too annoying to implement, will grow the size of the descriptor too, right? Except it will be a guess if it uses authentication or has more introduction points (since we raised the max limit for this and one HS can have 3, one 20 for example). I think just increasing both auth and non-auth HS descriptors with 330 bytes is cleaner and simpler.
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
If there are no indications that this can cause problems, yes of course. It's not weird if it's not "possible problematic".
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
There's little cost for using a fresh IV for each ENCRYPTED_DESC_COOKIE so we should do it.
b) Do we want a NACK from the HS for when client authorization fails? Currently the only way for a client to learn that they are not authorized (or that their keys changed or got revoked) is that they never complete the rendezvous.
To achieve that we would need a whole new cell (INTRODUCE_SERVICE_NACK) from the hidden service all the way to the client. Worth it?
No. I think it complicates stuff, adds extra code, adds a whole new cell and could be a potential DoS surface for a hidden service with auth enabled (you can trivially make a HS to send thousands of INTRODUCE_SERVICE_NACK cells and add extra load / bandwidth usage). I think failing with 'invalid authorization data or too many unsuccessful retries to connect to [scrubbed].onion.' if we can't complete rendezvous is ok. This way we also don't learn that a HS uses auth or doesn't exist any more or never existed, if we don't know anything about that HS and just found its address somewhere.
c) Another suggestion here by special, is to introduce an additional layer of access control at the HSDir request level, such that HSDirs don't even serve descriptors to clients that do not prove knowledge of a pre-shared fetch key.
This way unauthorized clients cannot even learn presense information of hidden services. This might be quite useful for applications like Ricochet, who want to hide their presense from revoked clients.
Of course this assumes that the HSDir is honest and will honor the fetch key protocol. However, even if the HSDir is dishonest we are just back to the current security level.
Interesting, this doesn't sound bad. But then the protocol to fetch the descriptor from the HSDir will have additional steps, so padding at this level will be needed to ensure a HSDir cannot distinguish if it just served a descriptor for which a pre-shared fetch key was provided or not. If presence hiding is very important and we have important use cases for it, we should think about this because it sounds like a good solution. My 2 cents are that presence hiding is considerably less important than the ability to allow only who I want to reach (connect) to me, and 100% presence hiding is very hard to achieve for, in my view, little gains.
We started sketching out a solution in the bottom of these notes: https://people.torproject.org/~asn/hs_notes/client_auth.jpg but the solution is not trivial to implement and we are not sure whether it's worth complicating the protocol further (e.g. we need to design a way for apps like Ricochet to get access to the fetch key).
d) It might be worthwhile padding the encrypted part of INTRODUCE1 to obscure whether client authorization is in place.
Don't understand how this exactly helps. This will hide the existence of a client authorization at introduction point side, correct? The introduction point doesn't know the hidden service, and it's rotated after a random number of introductions so the worst it can do is collect (highly inaccurate stats) about how many auth enabled HSes and non-auth HSes are out there? This should be fine and not enough of a threat to add padding (thinking about mobile clients trying to connect often here).
As you can see I have mainly worked on point (a) which I consider the most urgent. I welcome feedback on all points, so that we move forward with the design here.
Thanks :)
Thanks for your great work on this and everything else!
On 11 Oct (16:58:43), George Kadianakis wrote:
Hello,
Hi!
Thanks for this George!
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Personally, I would really like to be able to hide the fact that a service is using client authorization. Both could hide that but not that trivially.
I would pick double encryption for the simple fact that I really want to us to expose as little as we can in the plaintext section. Also, it seems to me that we could increase our chances of fuck up with this "fake client-auth-desc-key blob" situation for service without client auth?..
An other thing here, I have this feeling that client authorization might always leak unless we set a maximum number of clients and pad up to that value. I mean, how much do we pad in for a non-client auth service descriptor? I guess our ultimate goal would be to make all descriptors look a like in terms of size? (Except for extra IPs...)
The other thing I which we could fix if we pick double encryption is "if I know the .onion then I can know how many clients are authorized" but that's a thougher one and might need us to fake client auth in the encrypted section as well...
So we should try to answer this question: Do we care about the increasing size of the descriptor considerably from what we have now? I could see a concern on mobile or even the horrible world of IoT (which uses cell's networks a lot in the end.)
To give an idea, how we implemented 224 so far, a default descriptor (with 3 non-legacy IPs and no client auth) is 3153 bytes. Every new IPs adds ~870 bytes (this can vary a bit depending if IPv6). Adding a legacy IP (RSA keys), we are are ~1215 extra bytes (again vary because IPv6).
I tend to say yes to increasing descriptor size if it's offer more security. I mean, <10k bytes is pretty small also on todays network I guess?...
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
b) Do we want a NACK from the HS for when client authorization fails? Currently the only way for a client to learn that they are not authorized (or that their keys changed or got revoked) is that they never complete the rendezvous.
To achieve that we would need a whole new cell (INTRODUCE_SERVICE_NACK) from the hidden service all the way to the client. Worth it?
I would definitely not NACK with a error code that says "authorization failure" as this gives out a new "oracle" to anyone out there to poke the service.
However, I would assume that most of the NACK would come from bad authorization because establishing an INTRO with the service is usually not something that fails often...
So I'm hesitant here to do so as it would be very useful for only a specific use case of HS that is client auth. And when I say specific is that I assume blindly that most HS out there don't use client auth much... I might be very wrong!
c) Another suggestion here by special, is to introduce an additional layer of access control at the HSDir request level, such that HSDirs don't even serve descriptors to clients that do not prove knowledge of a pre-shared fetch key.
This way unauthorized clients cannot even learn presense information of hidden services. This might be quite useful for applications like Ricochet, who want to hide their presense from revoked clients.
In this case, revoking a client would mean changing the fetch key for _all_ clients as we can't really make one per-client else the descriptor size goes ++! but maybe we should think about it! Would be kind of a list of "fetch key" that are acceptable instead of a single one.
Of course this assumes that the HSDir is honest and will honor the fetch key protocol. However, even if the HSDir is dishonest we are just back to the current security level.
We started sketching out a solution in the bottom of these notes: https://people.torproject.org/~asn/hs_notes/client_auth.jpg but the solution is not trivial to implement and we are not sure whether it's worth complicating the protocol further (e.g. we need to design a way for apps like Ricochet to get access to the fetch key).
Right... it would be something extra you put in the .onion address I guess? Hey look, _bigger_ addresses :D. Or an extra string next to the .onion that Ricochet would have to generate as part of the Ricochet ID.
Globally, I'm not against this idea except if it's one single fetch key for everyone, it's kind of annoying and a compromised client makes it pointless as the service can't revoke it :S ...
d) It might be worthwhile padding the encrypted part of INTRODUCE1 to obscure whether client authorization is in place.
Yes, I would really like to hide it as much as we can. HS circuits already have a distinctive patterns, would be good to avoid leaking information on which HS is being used on that HS circuit... :)
Thanks! David
As you can see I have mainly worked on point (a) which I consider the most urgent. I welcome feedback on all points, so that we move forward with the design here.
Thanks :) _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi David,
David Goulet wrote:
Personally, I would really like to be able to hide the fact that a service is using client authorization. Both could hide that but not that trivially.
I would pick double encryption for the simple fact that I really want to us to expose as little as we can in the plaintext section. Also, it seems to me that we could increase our chances of fuck up with this "fake client-auth-desc-key blob" situation for service without client auth?..
An other thing here, I have this feeling that client authorization might always leak unless we set a maximum number of clients and pad up to that value. I mean, how much do we pad in for a non-client auth service descriptor? I guess our ultimate goal would be to make all descriptors look a like in terms of size? (Except for extra IPs...)
The other thing I which we could fix if we pick double encryption is "if I know the .onion then I can know how many clients are authorized" but that's a thougher one and might need us to fake client auth in the encrypted section as well...
So we should try to answer this question: Do we care about the increasing size of the descriptor considerably from what we have now? I could see a concern on mobile or even the horrible world of IoT (which uses cell's networks a lot in the end.)
To give an idea, how we implemented 224 so far, a default descriptor (with 3 non-legacy IPs and no client auth) is 3153 bytes. Every new IPs adds ~870 bytes (this can vary a bit depending if IPv6). Adding a legacy IP (RSA keys), we are are ~1215 extra bytes (again vary because IPv6).
I tend to say yes to increasing descriptor size if it's offer more security. I mean, <10k bytes is pretty small also on todays network I guess?...
Hey, very clever how you thought about this. I wanted to say no, don't limit the max number of clients can be authorized on a HS, but a descriptor has to have some hard limit anyway (I guess it has one, if not it should). So, bottom of the line, the theoretical possible number of authorized clients is already limited by the max descriptor size.
I think it's simpler if we think towards making the majority of descriptors look alike (in terms of size) as opposite to making each and every one look alike (in terms of size) because this means we need to pad up to the hard limit in all cases. Considering the introduction points, which can be from 3 to 20 and add to that how many clients we have authorized this creates very different possible descriptor sizes.
What we should do is come up with a fair number of authorized clients which is in most cases not exceeded (hard to come up with accurate stats for this), and pad up to that with dummy authorized clients for both auth enabled hidden services with less authorized clients than the so-called 'default' number and non-auth hidden services.
This could also partially solve "if I know the .onion I know how may clients are authorized". Only the auth-enabled HSes who have more authorized clients than the so-called 'default' number will leak this this popularity info, but it should be no worse than the current stage anyway. This is better because the majority of auth-enabled hidden services are assumed to be under this limit so you can't tell their exact number of authorized clients because you don't know how many of them are dummy clients.
Hardest part here is picking the right numbers:
MAX ALLOWED DESCRIPTOR SIZE (descriptor version will be useful in case we decide to change this limit in the future) DEFAULT NUMBER OF AUTHORIZED CLIENTS - let's say 100 (this is not based on any analysis, just what came into my mind - probably not a good number). I am inclined to make this number as high as we can considering mobile clients and/or small bandwidth clients.
In this case a HS with 1 authorized client will have in its descriptor: 1 real authorized client 99 dummy authorized clients which do not exit but nobody can tell
A HS with 2 authorized clients will have the real:dummy client ratio 2:98, and so on.
When new authorized clients are added, the dummy slots will be occupied first, if previous authorized clients weren't revoked/deleted.
A non-auth HS will have 100 dummy authorized clients (??? or 99 and global non-auth anyclient ???).
If we can't make all descriptors perfectly equal in terms of size (not because it's technically impossible, but because of the costs) at least making the very vast majority of them look alike is one step forward.
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Hello,
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
Hello again,
I read the feedback on the thread and thought some more about this. Here are some thoughts based on received feedback. A torspec branch coming soon if people agree with my points below.
I'd also like to introduce a new topic of discussion here:
d) Should we introduce the concept of stealth auth again?
IIUC the current prop224 client auth solutions are not providing all the security properties that stealth auth did. Specifically, if Alice is an ex-authorized-client of a hidden service and she got revoked, she can still fetch the descriptor of a hidden service and hence learn the uptime/presense of the HS. IIUC, with stealth auth this was not previously possible.
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Seems like people preferred the double-encryption idea here, so that we reveal the least amount of information possible in the plaintext part of the desc.
I think this is a reasonable point since if we put the auth keys in the plaintext part of the descriptor, and we always pad (or fake clients) up to N authorized clients, it will be obvious to an HSDir if a hidden service has more than N authorized clients (since we will need to fake 2*N clients then).
---
WRT protocol, I guess the idea here is that if client auth is enabled, then we add some client authorization fields in the top of the encrypted section of the descriptor, that can be used to find the client-auth descriptor encryption key. Then we add another client-auth-encrypted blob inside the encrypted part, which contains the intro points etc. and is encrypted using the descriptor encryption key found above.
So the first layer is encrypted using the onion address, and the second layer is encrypted using the client auth descriptor key. This won't be too hard to implement, but it's also different from what's currently coded in #17238.
Do people feel OK with this?
Also, what should happen if client auth is not used? Should we fall back to the current descriptor format, or should we fake authorized clients and add a fake client-auth-encrypted-blob for uniformity? Feedback is welcome here, and I think the main issue here is engineering time and reuse of the current code.
---
Now WRT security, even if we do the double-encryption thing, and we consider an HSDir adversary that knows the onion address but is not an authorized client,we still need to add fake clients, otherwise that adversary will know the exact number of authorized clients. So fake clients will probably need to be introduced anyhow.
As David pointed out, this all boils down to how much we pad the encrypted part of the descriptor, otherwise we always leak info. If we are hoping for a leakless strategy here, we should be generous with our padding.
Let's see how much padding we need:
- Each intro point adds about 1.1k bytes to the descriptor (according to david).
- Each block of 16 authorized clients adds about 1k bytes to the descriptor (according to the format described below).
- Apart from intro points and authorized clients, the rest of the descriptor is not that heavy: less than 1k bytes (right?)
To get an average size here, let's consider a normal descriptor with 5 intro points and 16 authorized clients. With the above values, the overhead on the encrypted part of the descriptor is about 7k bytes.
To get a maximum size here, let's consider a phat descriptor that contains 20 intro points and 160 authorized clients. With the above values, the overhead on the encrypted part of the descriptor will be 32k bytes.
Hence, here are some suggestions (read: magic numbers):
- We always pad the encrypted section of the descriptor to the nearest multiple of 10k bytes (read: we pad the plaintext before we encrypt).
This should be enough to obfuscate the number of IPs and authorized clients on most hidden services out there.
- If client auth is enabled, we always include a multiple of 16 authorized clients (and fake the extra if needed) in the encrypted portion.
- We set the maximum allowed size of descriptors on HSDirs to 40k bytes. This should be enough to accomodate the fat descriptor described above.
As said, I was quite generous with the max size here. even though I doubt any actual hidden services will have such enormous descriptors, but I guess allowing those might prove to be a good idea in the future.
I don't think 40k is that much in terms of size, especially when compared to things like the microdesc-consensus which is like 1.4MB, and is required for Tor to run.
The main issue with big max sizes here, are assholes using our DHT as cloud storage. I don't think 40k is that bad in this regard, but I'm not sure how to evaluate this properly.
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
People said this is a good idea and I agree.
Here is a suggested informal format, that gets placed in the beginning of the encrypted section of the descriptor:
desc-auth-type <auth-type> desc-auth-nonce <8-byte-nonce-base64> auth-client <client-id> <iv> <encrypted-cookie>
and we always include a multiple of 16 clients. Here is how it would look like in real life:
======================================================================= desc-auth-type cookie desc-auth-nonce JMk/8BTbhB4 auth-client dkW2nw OTqqSv29icTL5TSZ5TVQ3A +PIt0D9oWlDfbpGtRxGmeA auth-client z0/MMQ dw+pwJcLk9LB/FPfxFBL3g rFX9f6WUVZVUPEwFet428Q auth-client tH/BEQ zFWL1T9H/1fyV6bYW5Ol/Q /hjW1SgF0S3BANJhZZZ/OQ auth-client 2lxnoQ ggm/IraIMQ+L56V3R0OyHQ gI9Lh5azwxcunYwyFXxJSg auth-client S88yFw S4072NBKCwbwGep7/bJv+Q j3GdtDLAiZWI2jv0z6wfNw auth-client T7KbqA zhj5vu+HghqcMBRYpsGE0Q nQQtScbK91xx1G5l5gUWYg auth-client xPROzQ /OAH9FwXOufKGmFlBkqEJQ sqzeo6n4uMnqyghv3Vj3ZA auth-client l7lqEQ iZrRNH1Lg636j32tg7XfLQ HXeqg6nViGb7H4T1dYMK9Q auth-client +9ZUZw FReeAD5/mQD03J+YiffTKw oK1q7l/4JX+P08dLKYOmlw auth-client 0L9rXg xp9hvTWcWSmLBcyLN96Msg THWHP2nLlHBWWrwECOIg+A auth-client +kJcyQ nl7dkTOA9r10jk3Bo6I5WQ sGqMNtLMOiLDVDOr9YxJAw auth-client sa5PQQ oGqjP0Ko72fopFw2aAm2QA f+enrvjiDSXGJ3t77vDfAQ auth-client m87zTQ Pl5ITgw/6nb5zJPXjl9GPA X0lIhGNjXZqhGf+oHDX/wQ auth-client t8Ki0g GOPiP3WM+FQlDXLK1vUEOg 8bBZRrlxj6Ca392exkNuog auth-client 1D9wbQ 0Y5FZJGg30M2WPWu+xahbQ aXwcRLMS5MFAYcBrGEibVA auth-client UoLbLw jwM4/d5BUfch4FLpGogouQ r9P/aNX3pWseC7tlXx1I5Q ======================================================================
with a total size of 1090 bytes.
I think this looks much nicer than the binary format and easier to parse with the routerparse API as well.
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
People said this is a good idea, so in the example above I did it.
My main counter argument is the size increase, but perhaps being stingy here is stupid.
In any case, the size overhead comes to 23 bytes of base64 for every IV, so it's not that bad.
On 17 Oct (13:35:24), George Kadianakis wrote:
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Hello,
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
Hello again,
I read the feedback on the thread and thought some more about this. Here are some thoughts based on received feedback. A torspec branch coming soon if people agree with my points below.
I'd also like to introduce a new topic of discussion here:
d) Should we introduce the concept of stealth auth again?
IIUC the current prop224 client auth solutions are not providing all the security properties that stealth auth did. Specifically, if Alice is an ex-authorized-client of a hidden service and she got revoked, she can still fetch the descriptor of a hidden service and hence learn the uptime/presense of the HS. IIUC, with stealth auth this was not previously possible.
I think this has value if client revocation is a something that actually happens and the operator wants that revoked client to NEVER know anything about the service anymore.
My guts tells me that it might be a very small portion of operators that do that and have concerns on hidding the service. I could be wrong so we can try to ask around on our public channels and see what's the response.
I can see this feature being added _after_ deployment as well.
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Seems like people preferred the double-encryption idea here, so that we reveal the least amount of information possible in the plaintext part of the desc.
I think this is a reasonable point since if we put the auth keys in the plaintext part of the descriptor, and we always pad (or fake clients) up to N authorized clients, it will be obvious to an HSDir if a hidden service has more than N authorized clients (since we will need to fake 2*N clients then).
WRT protocol, I guess the idea here is that if client auth is enabled, then we add some client authorization fields in the top of the encrypted section of the descriptor, that can be used to find the client-auth descriptor encryption key. Then we add another client-auth-encrypted blob inside the encrypted part, which contains the intro points etc. and is encrypted using the descriptor encryption key found above.
Well, I would only encrypt the key that was used to encrypt the introduction points but I'm sure this is what you meant!
So the first layer is encrypted using the onion address, and the second layer is encrypted using the client auth descriptor key. This won't be too hard to implement, but it's also different from what's currently coded in #17238.
Indeed, we need to change stuff but I think it's fine. We can get #17238 merged and then simply apply those changes after. I'm not too concerned about the engineering logistics personally.
Do people feel OK with this?
Also, what should happen if client auth is not used? Should we fall back to the current descriptor format, or should we fake authorized clients and add a fake client-auth-encrypted-blob for uniformity? Feedback is welcome here, and I think the main issue here is engineering time and reuse of the current code.
Right, the fake client might be the big question mark. Discussing more below.
Now WRT security, even if we do the double-encryption thing, and we consider an HSDir adversary that knows the onion address but is not an authorized client,we still need to add fake clients, otherwise that adversary will know the exact number of authorized clients. So fake clients will probably need to be introduced anyhow.
As David pointed out, this all boils down to how much we pad the encrypted part of the descriptor, otherwise we always leak info. If we are hoping for a leakless strategy here, we should be generous with our padding.
Let's see how much padding we need:
Each intro point adds about 1.1k bytes to the descriptor (according to david).
Each block of 16 authorized clients adds about 1k bytes to the descriptor (according to the format described below).
Apart from intro points and authorized clients, the rest of the descriptor is not that heavy: less than 1k bytes (right?)
To get an average size here, let's consider a normal descriptor with 5 intro points and 16 authorized clients. With the above values, the overhead on the encrypted part of the descriptor is about 7k bytes.
To get a maximum size here, let's consider a phat descriptor that contains 20 intro points and 160 authorized clients. With the above values, the overhead on the encrypted part of the descriptor will be 32k bytes.
Hence, here are some suggestions (read: magic numbers):
We always pad the encrypted section of the descriptor to the nearest multiple of 10k bytes (read: we pad the plaintext before we encrypt).
This should be enough to obfuscate the number of IPs and authorized clients on most hidden services out there.
If client auth is enabled, we always include a multiple of 16 authorized clients (and fake the extra if needed) in the encrypted portion.
Interesting, so if I have 3 IPs (default) and 42 clients, the descriptor will be much bigger than the "normal" 3 IPs and 16 clients but yet the HSDir observer won't be able to know for sure if it's actually a shit ton of IPs or clients.
20 IPs == ~22k (considering 1.1k, a mix of legacy and normal ones) 160 clients == ~16k
This is good (and supports the increase to 20 IPs) in this case adding IPs or clients doesn't really tell anyone which one is being used. And padding make it confusing more...
Would be fun to run a table of the IP/client combination and see how much the sizes differ but for that we would need the _real_ numbers instead of estimation +/- 200 bytes.
- We set the maximum allowed size of descriptors on HSDirs to 40k bytes. This should be enough to accomodate the fat descriptor described above.
Current hardcoded maximum directory object size which is 10MB.
#define MAX_DIRECTORY_OBJECT_SIZE (10*(1<<20))
So we are good on that front. If the maximum is really 32k than 40k sounds good to me! Worst case, we go to 50k.
As said, I was quite generous with the max size here. even though I doubt any actual hidden services will have such enormous descriptors, but I guess allowing those might prove to be a good idea in the future.
I don't think 40k is that much in terms of size, especially when compared to things like the microdesc-consensus which is like 1.4MB, and is required for Tor to run.
The main issue with big max sizes here, are assholes using our DHT as cloud storage. I don't think 40k is that bad in this regard, but I'm not sure how to evaluate this properly.
Ok, let me do a crazy calculation:
Considering ~50k unique .onion in the network (and we assume they all move to prop224 at some point), it means 40KB * 50000 * 6 HSDir == 12000000 KB or 12000MB. We have what ~3k HSDirs thus 12000/3000 == 4MB per relay.
It's around of what I see on my relay last time I checked, I'm at ~130 descriptors at all time thus ~5.2MB for a 40KB descriptor.
If we scale that to a factor of 10 (assuming we go to 100k .onion), relays will end up with ~40MB in RAM which is not that crazy nowadays. And we have an OOM as well.
It's still a pretty big maximum but most HS will have a much smaller size anyway.
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
People said this is a good idea and I agree.
Here is a suggested informal format, that gets placed in the beginning of the encrypted section of the descriptor:
desc-auth-type <auth-type> desc-auth-nonce <8-byte-nonce-base64> auth-client <client-id> <iv> <encrypted-cookie>
and we always include a multiple of 16 clients. Here is how it would look like in real life:
======================================================================= desc-auth-type cookie desc-auth-nonce JMk/8BTbhB4 auth-client dkW2nw OTqqSv29icTL5TSZ5TVQ3A +PIt0D9oWlDfbpGtRxGmeA auth-client z0/MMQ dw+pwJcLk9LB/FPfxFBL3g rFX9f6WUVZVUPEwFet428Q auth-client tH/BEQ zFWL1T9H/1fyV6bYW5Ol/Q /hjW1SgF0S3BANJhZZZ/OQ auth-client 2lxnoQ ggm/IraIMQ+L56V3R0OyHQ gI9Lh5azwxcunYwyFXxJSg auth-client S88yFw S4072NBKCwbwGep7/bJv+Q j3GdtDLAiZWI2jv0z6wfNw auth-client T7KbqA zhj5vu+HghqcMBRYpsGE0Q nQQtScbK91xx1G5l5gUWYg auth-client xPROzQ /OAH9FwXOufKGmFlBkqEJQ sqzeo6n4uMnqyghv3Vj3ZA auth-client l7lqEQ iZrRNH1Lg636j32tg7XfLQ HXeqg6nViGb7H4T1dYMK9Q auth-client +9ZUZw FReeAD5/mQD03J+YiffTKw oK1q7l/4JX+P08dLKYOmlw auth-client 0L9rXg xp9hvTWcWSmLBcyLN96Msg THWHP2nLlHBWWrwECOIg+A auth-client +kJcyQ nl7dkTOA9r10jk3Bo6I5WQ sGqMNtLMOiLDVDOr9YxJAw auth-client sa5PQQ oGqjP0Ko72fopFw2aAm2QA f+enrvjiDSXGJ3t77vDfAQ auth-client m87zTQ Pl5ITgw/6nb5zJPXjl9GPA X0lIhGNjXZqhGf+oHDX/wQ auth-client t8Ki0g GOPiP3WM+FQlDXLK1vUEOg 8bBZRrlxj6Ca392exkNuog auth-client 1D9wbQ 0Y5FZJGg30M2WPWu+xahbQ aXwcRLMS5MFAYcBrGEibVA auth-client UoLbLw jwM4/d5BUfch4FLpGogouQ r9P/aNX3pWseC7tlXx1I5Q ======================================================================
with a total size of 1090 bytes.
I think this looks much nicer than the binary format and easier to parse with the routerparse API as well.
+1
... but hrm, I thought that auth-client line should also provide the key to decrypt the introduction point section no?
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
People said this is a good idea, so in the example above I did it.
My main counter argument is the size increase, but perhaps being stingy here is stupid.
In any case, the size overhead comes to 23 bytes of base64 for every IV, so it's not that bad.
I think it's fine.
Thanks! David
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 17 Oct (13:35:24), George Kadianakis wrote:
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Hello,
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
Hello again,
I read the feedback on the thread and thought some more about this. Here are some thoughts based on received feedback. A torspec branch coming soon if people agree with my points below.
I'd also like to introduce a new topic of discussion here:
<snip>
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Seems like people preferred the double-encryption idea here, so that we reveal the least amount of information possible in the plaintext part of the desc.
I think this is a reasonable point since if we put the auth keys in the plaintext part of the descriptor, and we always pad (or fake clients) up to N authorized clients, it will be obvious to an HSDir if a hidden service has more than N authorized clients (since we will need to fake 2*N clients then).
Hello,
I worked some more on the descriptor part of client authorization and prepared a torspec patch. You can find it at `prop224_client_auth_2` in my repo: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224_client_...
Based on received feedback, I went with the double-layer encryption style, where the first layer is encrypted using the HS pubkey and the second layer is encrypted using the descriptor cookie. Inside the first layer plaintext is the information for authorized clients to derive the descriptor cookie.
I also included the padding suggestions from my previous post that should help with hiding the number of client auths.
I'd like feedback on the following:
a) Do you like the descriptor format and logic? Can we make it nicer or easier to implement?
b) Do you like the proposal format? Is it messy and/or hard to understand? Ideas on how can it be improved?
I think we have reached the point where every subsystem of prop224 is complex enough to warrant its own proposal, but I'm resisting the urge to dig into this rabbithole right now.
c) Is the descriptor cookie encryption format good enough Namely: encrypted_cookie = STREAM(iv, client_auth_cookie) XOR descriptor_cookie
d) Current changes: I changed "authentication-required" to "intro-auth-required" in the descriptor, to make it more clear its about introduction-layer authentication.
Feedback on the patch and the above points is very much welcome!
BTW, I'm not done with this thread yet, there are still some more points that need to be handled wrt client authorization. But this spec patch is the most important and lengthiest of them all, so let's get it out of the way first.
Thanks!
On 01 Nov (13:32:13), George Kadianakis wrote:
David Goulet dgoulet@ev0ke.net writes:
[ text/plain ] On 17 Oct (13:35:24), George Kadianakis wrote:
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Hello,
we've reached the point in prop224 development where we need to pin down the precise cell formats, so that we can start implementing them. HS client authorization has been one of those areas that are not yet finalized and are still influencing cell format.
Here are some topics based on special's old notes, plus some further recent discussion with David and Yawning.
Hello again,
I read the feedback on the thread and thought some more about this. Here are some thoughts based on received feedback. A torspec branch coming soon if people agree with my points below.
I'd also like to introduce a new topic of discussion here:
<snip>
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Seems like people preferred the double-encryption idea here, so that we reveal the least amount of information possible in the plaintext part of the desc.
I think this is a reasonable point since if we put the auth keys in the plaintext part of the descriptor, and we always pad (or fake clients) up to N authorized clients, it will be obvious to an HSDir if a hidden service has more than N authorized clients (since we will need to fake 2*N clients then).
Hello,
I worked some more on the descriptor part of client authorization and prepared a torspec patch. You can find it at `prop224_client_auth_2` in my repo: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224_client_...
Based on received feedback, I went with the double-layer encryption style, where the first layer is encrypted using the HS pubkey and the second layer is encrypted using the descriptor cookie. Inside the first layer plaintext is the information for authorized clients to derive the descriptor cookie.
I also included the padding suggestions from my previous post that should help with hiding the number of client auths.
I'd like feedback on the following:
a) Do you like the descriptor format and logic? Can we make it nicer or easier to implement?
This line in the superencrypted section:
"auth-client" SP iv SP client-id SP encrypted-cookie
... all three values will be raw bytes as far as I can tell. I'm fine with bytes in the descriptor (like we do know for v2) but mixing "text param" with "binary param" might not be ideal. If we want a human readable representation, I would go with base64 encoding those or if not, full binary.
I like the human readable personally as it's so much easier to debug. As we stated before, size is not that of an issue and going from binary to base64 won't create huge strings at all... Downside, we have to parse more strings! :P but we do that _HEAVILY_ anyway with descriptors so...
b) Do you like the proposal format? Is it messy and/or hard to understand? Ideas on how can it be improved?
I think we have reached the point where every subsystem of prop224 is complex enough to warrant its own proposal, but I'm resisting the urge to dig into this rabbithole right now.
We could indeed split but I think it's fine for now. Not sure it's worth the extra work for now unless anyone thinks so?
c) Is the descriptor cookie encryption format good enough Namely: encrypted_cookie = STREAM(iv, client_auth_cookie) XOR descriptor_cookie
d) Current changes: I changed "authentication-required" to "intro-auth-required" in the descriptor, to make it more clear its about introduction-layer authentication.
Feedback on the patch and the above points is very much welcome!
Extra stuff:
- I think "superencrypted" -> "super-encrypted" would be nicer as everything in the descriptor as that separation of word. Or even "client-encrypted" if we want to add extra semantic. No strong opinion apart from the "-" :).
- [XXX consider randomization of the value 16]
If it's fixed, we basically create bucket so a client can know that there are 0-16 clients or 16-32 clients and so on.
If we randomize that value and let's say it's 7 then we have bucket of 7. If that value is randomized _every_ new descriptor, we create multiple size of buckets but over time someone could deduce (maybe) the low bound of clients by observing all random values and thus assume there are 0-<low bound>.
I'm uncertain here what's best but seems that in any case, bucketing is happening as we pad with fake "auth-client". So I would assume here, out of my head to be safe, that we might want _all_ services to kind of look the same thus a fixed value would make sense following that train of thought.
I'm liking the rest here! We'll have to think also on some padding in the INTRODUCE1 cell to avoid leaking client auth is being used.
Cheers! David
BTW, I'm not done with this thread yet, there are still some more points that need to be handled wrt client authorization. But this spec patch is the most important and lengthiest of them all, so let's get it out of the way first.
Thanks!
On 3 Nov. 2016, at 04:45, David Goulet dgoulet@ev0ke.net wrote:
- I think "superencrypted" -> "super-encrypted" would be nicer as everything
in the descriptor as that separation of word. Or even "client-encrypted" if we want to add extra semantic. No strong opinion apart from the "-" :).
client-encrypted could be very confusing. It sounds like the client has encrypted it.
- [XXX consider randomization of the value 16]
If it's fixed, we basically create bucket so a client can know that there are 0-16 clients or 16-32 clients and so on.
If we randomize that value and let's say it's 7 then we have bucket of 7. If that value is randomized _every_ new descriptor, we create multiple size of buckets but over time someone could deduce (maybe) the low bound of clients by observing all random values and thus assume there are 0-<low bound>.
Yes, this is true. And it would be quite easy over time, as hidden services don't change their client auth that often. So you would just need to download a descriptor every hour.
I'm uncertain here what's best but seems that in any case, bucketing is happening as we pad with fake "auth-client". So I would assume here, out of my head to be safe, that we might want _all_ services to kind of look the same thus a fixed value would make sense following that train of thought.
Yes, buckets are the best.
State of the art is add random noise then bucket, but I don't think that's needed here. And the noise would have to be large to hide an unchanging value.
T
I am very happy with the torspec patch.
Not quoting entirely, only want to add something wrt randomizing the value for fake clients based on David's and teor's comments:
David Goulet wrote: [SNIP]
- I think "superencrypted" -> "super-encrypted" would be nicer as everything in the descriptor as that separation of word. Or even "client-encrypted" if we want to add extra semantic. No strong opinion apart from the "-" :).
I prefer super-encrypted vs. client-encrypted.
[XXX consider randomization of the value 16]
If it's fixed, we basically create bucket so a client can know that there are 0-16 clients or 16-32 clients and so on.
If we randomize that value and let's say it's 7 then we have bucket of 7. If that value is randomized _every_ new descriptor, we create multiple size of buckets but over time someone could deduce (maybe) the low bound of clients by observing all random values and thus assume there are 0-<low bound>.
I'm uncertain here what's best but seems that in any case, bucketing is happening as we pad with fake "auth-client". So I would assume here, out of my head to be safe, that we might want _all_ services to kind of look the same thus a fixed value would make sense following that train of thought.
I'm liking the rest here! We'll have to think also on some padding in the INTRODUCE1 cell to avoid leaking client auth is being used.
This is true, we create buckets no matter what, but I think it's better if one has to watch a hidden service for a lot more time to determine the probable number rather than being able to tell from the first descriptor that there are 0-16 clients, 16-32 clients and so on.
I fully agree that randomizing _every_ new descriptor does not help and probably in short time someone could deduce a possible number, but I am slightly uncomfortable with a global fixed value for this. One more idea, if it's not helpful we can just go ahead with a fixed value of 16.
I think it's better if we pick a random number between 8 and 32 fake clients and remember the picked value so it will be used for every new descriptor until something in our setup changes or enough time has passed. In order to know when to reset it, we save it (in our state) along with: 1. The number of real authorized clients when the random value was picked. 2. Timestamp when the random value was picked + an end of life for the random value.
We reset the random value of fake authorized clients and also its end of life when:
a) number of real authorized clients in torrc changes from what we have in our state. b) end of life for the random value is reached. End of life will be timestamp + a random period between 30 and 90 days. c) obvious case when Tor is re-installed and old state is lost.
We call this function on every HUP and (re)start. We can tune the numbers 8 - 32 and period 30 - 90 days as you like.
This way there are a lot of buckets and significantly more time needed for an observer to deduce a probable number. It is quite possible one can never deduce a "probable enough" number.
We combine this with faking extra if needed in the encrypted portion to the next multiple of 10k bytes.
It's true that it won't help if the hidden service operator changes the number of authorized clients every hour for a long period but in practice this doesn't happen - number of authorized clients changes rarely. And even in this scenario it still makes things a lot more confusing.
Compared to other parts of prop 224, this is easy to code and should be worth the effort. What do you think?
Thanks a lot for considering all my previous points.
On 3 Nov. 2016, at 10:37, s7r s7r@sky-ip.org wrote:
I am very happy with the torspec patch.
Not quoting entirely, only want to add something wrt randomizing the value for fake clients based on David's and teor's comments:
David Goulet wrote: [SNIP]
- I think "superencrypted" -> "super-encrypted" would be nicer as everything
in the descriptor as that separation of word. Or even "client-encrypted" if we want to add extra semantic. No strong opinion apart from the "-" :).
I prefer super-encrypted vs. client-encrypted.
- [XXX consider randomization of the value 16]
If it's fixed, we basically create bucket so a client can know that there are 0-16 clients or 16-32 clients and so on.
If we randomize that value and let's say it's 7 then we have bucket of 7. If that value is randomized _every_ new descriptor, we create multiple size of buckets but over time someone could deduce (maybe) the low bound of clients by observing all random values and thus assume there are 0-<low bound>.
I'm uncertain here what's best but seems that in any case, bucketing is happening as we pad with fake "auth-client". So I would assume here, out of my head to be safe, that we might want _all_ services to kind of look the same thus a fixed value would make sense following that train of thought.
I'm liking the rest here! We'll have to think also on some padding in the INTRODUCE1 cell to avoid leaking client auth is being used.
This is true, we create buckets no matter what, but I think it's better if one has to watch a hidden service for a lot more time to determine the probable number rather than being able to tell from the first descriptor that there are 0-16 clients, 16-32 clients and so on.
I fully agree that randomizing _every_ new descriptor does not help and probably in short time someone could deduce a possible number, but I am slightly uncomfortable with a global fixed value for this. One more idea, if it's not helpful we can just go ahead with a fixed value of 16.
I think it's better if we pick a random number between 8 and 32 fake clients and remember the picked value so it will be used for every new descriptor until something in our setup changes or enough time has passed. In order to know when to reset it, we save it (in our state) along with:
- The number of real authorized clients when the random value was picked.
- Timestamp when the random value was picked + an end of life for the
random value.
We reset the random value of fake authorized clients and also its end of life when:
a) number of real authorized clients in torrc changes from what we have in our state. b) end of life for the random value is reached. End of life will be timestamp + a random period between 30 and 90 days. c) obvious case when Tor is re-installed and old state is lost.
We call this function on every HUP and (re)start. We can tune the numbers 8 - 32 and period 30 - 90 days as you like.
This way there are a lot of buckets and significantly more time needed for an observer to deduce a probable number. It is quite possible one can never deduce a "probable enough" number.
We combine this with faking extra if needed in the encrypted portion to the next multiple of 10k bytes.
It's true that it won't help if the hidden service operator changes the number of authorized clients every hour for a long period but in practice this doesn't happen - number of authorized clients changes rarely. And even in this scenario it still makes things a lot more confusing.
Compared to other parts of prop 224, this is easy to code and should be worth the effort. What do you think?
If you want to do it this way, with noise and buckets, ask someone who is good at differential privacy to do the numbers for you, rather than guessing.
You'll need to know the level of activity you want to hide.
T
teor wrote:
On 3 Nov. 2016, at 10:37, s7r s7r@sky-ip.org wrote:
I am very happy with the torspec patch.
Not quoting entirely, only want to add something wrt randomizing the value for fake clients based on David's and teor's comments:
David Goulet wrote: [SNIP]
- I think "superencrypted" -> "super-encrypted" would be nicer as everything
in the descriptor as that separation of word. Or even "client-encrypted" if we want to add extra semantic. No strong opinion apart from the "-" :).
I prefer super-encrypted vs. client-encrypted.
- [XXX consider randomization of the value 16]
If it's fixed, we basically create bucket so a client can know that there are 0-16 clients or 16-32 clients and so on.
If we randomize that value and let's say it's 7 then we have bucket of 7. If that value is randomized _every_ new descriptor, we create multiple size of buckets but over time someone could deduce (maybe) the low bound of clients by observing all random values and thus assume there are 0-<low bound>.
I'm uncertain here what's best but seems that in any case, bucketing is happening as we pad with fake "auth-client". So I would assume here, out of my head to be safe, that we might want _all_ services to kind of look the same thus a fixed value would make sense following that train of thought.
I'm liking the rest here! We'll have to think also on some padding in the INTRODUCE1 cell to avoid leaking client auth is being used.
This is true, we create buckets no matter what, but I think it's better if one has to watch a hidden service for a lot more time to determine the probable number rather than being able to tell from the first descriptor that there are 0-16 clients, 16-32 clients and so on.
I fully agree that randomizing _every_ new descriptor does not help and probably in short time someone could deduce a possible number, but I am slightly uncomfortable with a global fixed value for this. One more idea, if it's not helpful we can just go ahead with a fixed value of 16.
I think it's better if we pick a random number between 8 and 32 fake clients and remember the picked value so it will be used for every new descriptor until something in our setup changes or enough time has passed. In order to know when to reset it, we save it (in our state) along with:
- The number of real authorized clients when the random value was picked.
- Timestamp when the random value was picked + an end of life for the
random value.
We reset the random value of fake authorized clients and also its end of life when:
a) number of real authorized clients in torrc changes from what we have in our state. b) end of life for the random value is reached. End of life will be timestamp + a random period between 30 and 90 days. c) obvious case when Tor is re-installed and old state is lost.
We call this function on every HUP and (re)start. We can tune the numbers 8 - 32 and period 30 - 90 days as you like.
This way there are a lot of buckets and significantly more time needed for an observer to deduce a probable number. It is quite possible one can never deduce a "probable enough" number.
We combine this with faking extra if needed in the encrypted portion to the next multiple of 10k bytes.
It's true that it won't help if the hidden service operator changes the number of authorized clients every hour for a long period but in practice this doesn't happen - number of authorized clients changes rarely. And even in this scenario it still makes things a lot more confusing.
Compared to other parts of prop 224, this is easy to code and should be worth the effort. What do you think?
If you want to do it this way, with noise and buckets, ask someone who is good at differential privacy to do the numbers for you, rather than guessing.
You'll need to know the level of activity you want to hide.
T
As I said the numbers can be changed - I was illustrating an example. I guessed some numbers that seamed reasonable to me so I could give an example, and also because it's not a critical part. We only try to hide the number of real authorized clients, or make it as hard as possible for an observer to deduce a number close to the realistic number of authorized clients, that's all.
Simply using the numbers that were guessed without deep knowledge in differential privacy is a lot better than using a global fixed value of 16, but as I said this doesn't need to be a debate because I am not against the fixed value, only saying it's better to randomize, if the solution exists.
On Tue, Nov 1, 2016 at 1:32 PM, George Kadianakis desnacked@riseup.net wrote:
I worked some more on the descriptor part of client authorization and prepared a torspec patch. You can find it at `prop224_client_auth_2` in my repo: https://gitweb.torproject.org/user/asn/torspec.git/commit/?h=prop224_client_...
Based on received feedback, I went with the double-layer encryption style, where the first layer is encrypted using the HS pubkey and the second layer is encrypted using the descriptor cookie. Inside the first layer plaintext is the information for authorized clients to derive the descriptor cookie.
I also included the padding suggestions from my previous post that should help with hiding the number of client auths.
Hi, George! This looks like solid stuff. I'll try to answer your questions and
I'd like feedback on the following:
a) Do you like the descriptor format and logic? Can we make it nicer or easier to implement?
No objections.
b) Do you like the proposal format? Is it messy and/or hard to understand? Ideas on how can it be improved?
I think it's fine.
I think we have reached the point where every subsystem of prop224 is complex enough to warrant its own proposal, but I'm resisting the urge to dig into this rabbithole right now.
Maybe we can clean it all up when we turn it from "prop224" to "rend-spec-v2.txt" ? :)
c) Is the descriptor cookie encryption format good enough Namely: encrypted_cookie = STREAM(iv, client_auth_cookie) XOR descriptor_cookie
I don't much care for the lack of a MAC here. I haven't found any actual vulnerability here, but every time in my life that I have omitted a MAC from a malleable ciphertext, I have turned out to regret it.
d) Current changes: I changed "authentication-required" to "intro-auth-required" in the descriptor, to make it more clear its about introduction-layer authentication.
Feedback on the patch and the above points is very much welcome!
BTW, I'm not done with this thread yet, there are still some more points that need to be handled wrt client authorization. But this spec patch is the most important and lengthiest of them all, so let's get it out of the way first.
So, here's my feedback on the branch itself.
wrt 3da540606a85e6 "Make subcredential actually change every time period."
This change is safe, I think, but not necessary: Note that the blinded_public_key input already changes every time period because of the nonce value N used in blinding the public key.
wrt a85ffa341cc6c4 "Use per-client desc auth keys"
What is the format of "IV" and "client-id" and "encrypted-cookie"? Base64? How long are they? I would guess "base64-encoded, 32 bytes each"?
The IV has to be random, right? The spec ought to say so, but I didn't see where.
Malleability on the encrypted descriptor_cookie bothers me for some reason I can't figure out; see note above.
The syntax on "superencrypted" doesn't seem right. The "encrypted-string" part should probably be after an NL, not an SP, right?
Descriptor-cookie needs to be random each time, yeah? Does the spec say so?
In all, this looks fine to me. I like the part where we do two layers of encryption unconditionally.
Hello George,
Inline comments:
Hello again,
I read the feedback on the thread and thought some more about this. Here are some thoughts based on received feedback. A torspec branch coming soon if people agree with my points below.
I'd also like to introduce a new topic of discussion here:
d) Should we introduce the concept of stealth auth again?
IIUC the current prop224 client auth solutions are not providing all the security properties that stealth auth did. Specifically, if Alice is an ex-authorized-client of a hidden service and she got revoked, she can still fetch the descriptor of a hidden service and hence learn the uptime/presense of the HS. IIUC, with stealth auth this was not previously possible.
I also share David's feeling here, presence hiding is not so critical and I am not sure if its worth its engineering and additional code costs. We can add this feature any time after deployment anyway because there are many questions and we need some stats and to analyze user demands in order to take the right decision here. Freezing this specific feature until further analysis shouldn't be a problem.
a) I think the most important problem here is that the authorization-key logic in the current prop224 is very suboptimal. Specifically, prop224 uses a global authorization-key to ensure that descriptors are only read by authorized clients. However, since that key is global, if we ever want to revoke a single client we need to change the keys for all clients. The current rend-spec.txt does not suffer from this issue, hence I adapted the current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes: https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization is not enabled? Otherwise, we leak to the HSDir whether client auth is enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the descriptor. So that we require subcredential knowledge to access the encrypted part, and then client_auth_cookie knowledge to get the encryption key to decrypt the intro points etc. I feel that this double-encryption design might be too annoying to implement, but perhaps it's worth it?
Seems like people preferred the double-encryption idea here, so that we reveal the least amount of information possible in the plaintext part of the desc.
I think this is a reasonable point since if we put the auth keys in the plaintext part of the descriptor, and we always pad (or fake clients) up to N authorized clients, it will be obvious to an HSDir if a hidden service has more than N authorized clients (since we will need to fake 2*N clients then).
Agreed.
WRT protocol, I guess the idea here is that if client auth is enabled, then we add some client authorization fields in the top of the encrypted section of the descriptor, that can be used to find the client-auth descriptor encryption key. Then we add another client-auth-encrypted blob inside the encrypted part, which contains the intro points etc. and is encrypted using the descriptor encryption key found above.
So the first layer is encrypted using the onion address, and the second layer is encrypted using the client auth descriptor key. This won't be too hard to implement, but it's also different from what's currently coded in #17238.
Do people feel OK with this?
Yes, sounds good.
Also, what should happen if client auth is not used? Should we fall back to the current descriptor format, or should we fake authorized clients and add a fake client-auth-encrypted-blob for uniformity? Feedback is welcome here, and I think the main issue here is engineering time and reuse of the current code.
I say Yes with capital letter here. We should make as many descriptors as we can look alike as we can. If client auth is not used, that descriptor should contain N fake clients (given we choose a reasonable N that will result in an reasonable average size for descriptors network wide).
Now WRT security, even if we do the double-encryption thing, and we consider an HSDir adversary that knows the onion address but is not an authorized client,we still need to add fake clients, otherwise that adversary will know the exact number of authorized clients. So fake clients will probably need to be introduced anyhow.
Of course, fake clients will patch this problem in a good way - an adversary that only knows the onion address but its not an authorized client will not have an _exact_ number of authorized clients no matter what. And the additional padding that comes with fake clients is helpful in making majority of descriptors alike in terms of size, which is our best move here the way I see it (solves 2 things).
As David pointed out, this all boils down to how much we pad the encrypted part of the descriptor, otherwise we always leak info. If we are hoping for a leakless strategy here, we should be generous with our padding.
Agreed. A leakless solution here is to pad all descriptors to the hard limit, but this is not worth it. The hard limit is assumed to be reached only by HS-es that use auth and have a huge number of authorized clients, case in which they might look into scalability solutions, like running multiple onion hostnames for client groups linked on a single backend service. Anyway, what I am trying to say here is that I think these will be very isolated cases, if they will be at all, the rest of descriptors will just look alike with a reasonable N fake clients that don't grow the descriptors to the hard limit for everyone.
Let's see how much padding we need:
Each intro point adds about 1.1k bytes to the descriptor (according to david).
Each block of 16 authorized clients adds about 1k bytes to the descriptor (according to the format described below).
Apart from intro points and authorized clients, the rest of the descriptor is not that heavy: less than 1k bytes (right?)
To get an average size here, let's consider a normal descriptor with 5 intro points and 16 authorized clients. With the above values, the overhead on the encrypted part of the descriptor is about 7k bytes.
To get a maximum size here, let's consider a phat descriptor that contains 20 intro points and 160 authorized clients. With the above values, the overhead on the encrypted part of the descriptor will be 32k bytes.
Hence, here are some suggestions (read: magic numbers):
We always pad the encrypted section of the descriptor to the nearest multiple of 10k bytes (read: we pad the plaintext before we encrypt).
This should be enough to obfuscate the number of IPs and authorized clients on most hidden services out there.
Sounds good.
- If client auth is enabled, we always include a multiple of 16 authorized clients (and fake the extra if needed) in the encrypted portion.
Let's randomize a bit more here in order not to give an attacker that knows the onion address (and is a revoked client) one fixed number. To make assumptions - randomization is always better. Let's include a multiple of a random number between 8 and 32 fake authorized clients and fake extra if needed in the encrypted portion, *based on the size of the descriptor with no fake data*. Same fake clients random number will apply for hidden services that do not use auth at all.
Here we should ensure fake authorized client slots do not eat the space for real authorized clients in the descriptor. So the wording should be different than "we always include a multiple of Y authorized clients" - if a HS has real authorized clients configured that make the descriptor size 40k bytes, we should not add any fake clients obviously. Nobody can tell except the HS how many are fake and how many are real (referring to attackers that know the onion address here), so what we are doing with this is ensuring real data takes priority over padding data.
I am sure this is what you meant, just noted that it reads a little confusing so we should rephrase for torspec.
- We set the maximum allowed size of descriptors on HSDirs to 40k bytes. This should be enough to accomodate the fat descriptor described above.
As said, I was quite generous with the max size here. even though I doubt any actual hidden services will have such enormous descriptors, but I guess allowing those might prove to be a good idea in the future.
I don't think 40k is that much in terms of size, especially when compared to things like the microdesc-consensus which is like 1.4MB, and is required for Tor to run.
The main issue with big max sizes here, are assholes using our DHT as cloud storage. I don't think 40k is that bad in this regard, but I'm not sure how to evaluate this properly.
+1 on the 40k hard limit.
ii) Should we use the descriptor ASCII format to encode all the client-auth-desc-key data? Or is that weird binary format OK?
People said this is a good idea and I agree.
Here is a suggested informal format, that gets placed in the beginning of the encrypted section of the descriptor:
desc-auth-type <auth-type> desc-auth-nonce <8-byte-nonce-base64> auth-client <client-id> <iv> <encrypted-cookie>
and we always include a multiple of 16 clients. Here is how it would look like in real life:
======================================================================= desc-auth-type cookie desc-auth-nonce JMk/8BTbhB4 auth-client dkW2nw OTqqSv29icTL5TSZ5TVQ3A +PIt0D9oWlDfbpGtRxGmeA auth-client z0/MMQ dw+pwJcLk9LB/FPfxFBL3g rFX9f6WUVZVUPEwFet428Q auth-client tH/BEQ zFWL1T9H/1fyV6bYW5Ol/Q /hjW1SgF0S3BANJhZZZ/OQ auth-client 2lxnoQ ggm/IraIMQ+L56V3R0OyHQ gI9Lh5azwxcunYwyFXxJSg auth-client S88yFw S4072NBKCwbwGep7/bJv+Q j3GdtDLAiZWI2jv0z6wfNw auth-client T7KbqA zhj5vu+HghqcMBRYpsGE0Q nQQtScbK91xx1G5l5gUWYg auth-client xPROzQ /OAH9FwXOufKGmFlBkqEJQ sqzeo6n4uMnqyghv3Vj3ZA auth-client l7lqEQ iZrRNH1Lg636j32tg7XfLQ HXeqg6nViGb7H4T1dYMK9Q auth-client +9ZUZw FReeAD5/mQD03J+YiffTKw oK1q7l/4JX+P08dLKYOmlw auth-client 0L9rXg xp9hvTWcWSmLBcyLN96Msg THWHP2nLlHBWWrwECOIg+A auth-client +kJcyQ nl7dkTOA9r10jk3Bo6I5WQ sGqMNtLMOiLDVDOr9YxJAw auth-client sa5PQQ oGqjP0Ko72fopFw2aAm2QA f+enrvjiDSXGJ3t77vDfAQ auth-client m87zTQ Pl5ITgw/6nb5zJPXjl9GPA X0lIhGNjXZqhGf+oHDX/wQ auth-client t8Ki0g GOPiP3WM+FQlDXLK1vUEOg 8bBZRrlxj6Ca392exkNuog auth-client 1D9wbQ 0Y5FZJGg30M2WPWu+xahbQ aXwcRLMS5MFAYcBrGEibVA auth-client UoLbLw jwM4/d5BUfch4FLpGogouQ r9P/aNX3pWseC7tlXx1I5Q ======================================================================
with a total size of 1090 bytes.
I think this looks much nicer than the binary format and easier to parse with the routerparse API as well.
+1.
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt does not do that, and IIUC that's OK because it uses a fresh key for every encryption (even though the plaintext and IV is the same).
People said this is a good idea, so in the example above I did it.
My main counter argument is the size increase, but perhaps being stingy here is stupid.
In any case, the size overhead comes to 23 bytes of base64 for every IV, so it's not that bad.
I think this is fine.