Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
I'm going to focus only on the subset of those categories that Roger/David told me are the most important for the sponsor. These are: - Safe statistics collection - Tor controller API improvements - Performance improvements - Opt-in HS indexing service
I haven't yet split projects into deliverables; this is a middle step to getting there. Next step is to filter and then ticketify what we have. After that we need to prioritize and pick the projects that will become deliverables.
In each category, I have slightly ordered the items (so, more important items will usually be on the top, but that's not always true). I have also tried to include all the tickets that are marked as SponsorR in trac.
So, let's go:
== Safe statistics collection ==
We've discussed this quite a bit over the past year and I think we all pretty much agree on which stats are safe to collect and which not.
I think we all agree that collecting the number of HS circuits and traffic volume from RPs (#13192) is harmless [0] and useful information to have. We need to clean up Roger's patch to add that information in extra-info descriptors, and then do some visualisations. That would give us a good idea of how much HSes are used.
OTOH, other statistics like "# of HS descriptors" are not that harmless and the upcoming HS redesign will block us from getting this information anyway.
For now, I think we should focus on #13192 for this project.
== Tor controller API improvements ==
To better refine this project, we should think about what we want to get out of it. Here are some outcomes:
a) A better control API allows us to perform better performance measurements for HSes.
Karsten in #1944 worked on performance measurements of HS circuit establishment. You can find his very useful results here: http://ec2-54-92-231-52.compute-1.amazonaws.com/
We should understand exactly how Karsten is gathering those events, and see whether we can improve the timing accuracy or if we are missing any events. We need to also figure out how to do useful measurements in causal events like the race between the INTRODUCE_ACK cell and the RENDEZVOUS2. We also need to find a way to match rendezvous circuits with introduction circuits: https://trac.torproject.org/projects/tor/ticket/1944#comment:35
All in all, this seems like a project worth doing right because it will be useful in the future. It can even act as an automated regression test.
b) This might also be a good time to start working on automated integration tests for HSes.
It should be possible to spin up private Chutney networks and test that particular HSes are reachable. Or perform regression tests; for example, Roger recently suggested writing a regression test to make sure that clocks don't need to be synchronized to build HS circuits (#13494).
We should also make testing networks better for HS testing: - #13401 TestingTorNetwork should crank down RendPostPeriod too?
c) Tor should better expose error messages of failed operations. For example, this could allow TBB to inform users whether they mistyped the onion address or the HS is actually down, and it would also let us do #13208. Proposal 229 and ticket #13212 are related to this. We should see whether the PT team is planning to implement proposal 229 and how we can synchronise.
d) There are various projects that are using HSes these days (TorChat, Pond, GlobaLeaks, Ricochet, etc.). We should think whether we want to support these use cases and how we can make their life easier. For example, Fabio has been asking for a way to spin up HSes using the control port (#5976). What other features do people want from the control port?
And here are some more tickets marked as SponsorR from this category: - #8993 Better hidden service support on Tor control interface - #13206 Write up walkthrough of control port events when accessing a hidden service - #2554 extend torperf to record hidden service time components
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better? IIUC, we are mainly interested in client-side performance, but if a change makes both sides faster that's even better.
Some projects:
a) Looking at Karsten's #1944 results http://ec2-54-92-231-52.compute-1.amazonaws.com/ we see that fetching HS descriptors takes much more time than it should. I wonder why this is the case. Is there another ntohl bug there?
We should perform measurements and get a good understanding of what's going on in this step. Here are some tickets that Roger opened to do exactly that: - #13208 What's the average number of hsdir fetches before we get the hsdesc? - #13209 Write a hidden service hsdir health measurer
And here is a ticket with a potential issue: - #13207 Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?
b) Improving the other parts of the circuit establishment process is also important: - #8239 Hidden services should try harder to reuse their old intro points - #3733 Tor should abandon rendezvous circuits that cause a client request to time out - #13222 Clients accessing a hidden service can establish their rend point in parallel to fetching the hsdesc
Furthermore, an area of Tor that might give us better performance but we haven't really explored yet is preemptive circuits. #13239 is about building more internal circuits for HSes.
And here is a ticket suggesting more measurements: - #13194 Track time between ESTABLISH_RENDEZVOUS and RENDEZVOUS1 cell
c) Another important project in this area is parallelizing HS crypto. I haven't looked at what this would actually entail, but it will probably involve implementing the undone parts of proposal 220/224.
d) This might be the time to implement Encrypted Services? Many people have been asking for this feature and this might be the right time to do it: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-enc...
e) Following the trail of #13207, we should look at all the magic numbers currently used by HSes, document them and see if they make sense. This includes the number of IPs (#8950), the number of HSDirs/replicas, the intro point expiration date, etc.
Also, we should revisit the flags used when doing path selection for RPs, IPs, etc.
f) On a more researchy tone, this might also be a good point to start poking at the HS scalability project since it will really affect HS performance.
We should look at Christopher Baines' ideas and write a Tor proposal out of them: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html Last time I looked, Christopher's ideas required implementing proposal225 and #8239.
g) All the projects above are aiming at improving circuit establishment performance, but none of them are dealing with performance improvements after the HS circuit has been established.
On an even more researchy tone, Qingping Hou et al wrote a proposal to reduce the length of HS circuits to 5 hops (down from 6). You can find their proposal here: https://lists.torproject.org/pipermail/tor-dev/2014-February/006198.html
The project is crazy and dangerous and needs lots of analysis, but it's something worth considering. Maybe this is a good time to do this analysis?
h) Back to the community again. There have recently appeared a few messaging protocols that are inherently using HSes to provide link layer confidentiality and anonymity [1]. Examples include Pond, Ricochet and TorChat.
Some of these applications are creating one or more HSes per user, with the assumption that HSes are something easy to make and there is no problem in having lots of them. People are wondering how well these applications scale and whether they are using the Tor network the right way. See John Brooks' mail for a small analysis: https://moderncrypto.org/mail-archive/messaging/2014/000434.html
It might be worth researching these use cases to see how well Tor supports them and how they can be supported better (or whether they are a bad idea entirely).
== Opt-in HS indexing service ==
This seems like a fun project that can be used in various ways in the future. Of course, the feature must remain opt-in so that only services that want to be public will surface.
For this project, we could make some sort of 'HS authority' which collects HS information (the HS descriptor?) from volunteering HSes. It's unclear who will run an HS authority; maybe we can work with ahmia so that they integrate it in their infrastructure?
If we are more experimental, we can even build a basic petname system using the HS authority [2]. Maybe just a "simple" NAME <-> PUBKEY database where HSes can register themselves in a FIFO fashion. This might cause tons of domain camping and attempts for dirty sybil attacks, but it might develop into something useful. Worst case we can shut it down and call the experiment done? AFAIK, I2P has been doing something similar at https://geti2p.net/en/docs/naming
== Security / Miscellaneous ==
I also noticed that some tickets on trac were assigned to SponsorR but I couldn't fit them in the above categories. They are mainly security enhancements or code improvements. Here is a dump of the tickets:
Security: - #13214 HS clients don't validate descriptor-id returned by HSDir - #7803 Clients shouldn't send timestamps in INTRODUCE1 cells - #8243 Getting the HSDir flag should require more effort - #2715 Is rephist-calculated uptime the right metric for HSDir assignment?
Miscellaneous: - #13223 Refactor rend_client_refetch_v2_renddesc() - #13287 Investigate mysterious 24-hour lump in hsdir desc fetches - #8902 Rumors that hidden services have trouble scaling to 100 concurrent connections
== Epilogue ==
What useful projects/tickets did I forget here?
Which tasks from the above we should not do? I just went ahead and wrote down all the projects I could think of, with the idea that we will filter stuff later.
Thanks!
Footnotes:
[0]: since RPs are picked at random by the client and not by the HS.
[1]: see https://moderncrypto.org/mail-archive/messaging/2014/000434.html
[2]: or if someone is more crazy, try to integrate GNUnet's GNS: https://gnunet.org/gns
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 10/20/2014 08:37 AM, George Kadianakis wrote:
If we are more experimental, we can even build a basic petname system using the HS authority [2]. Maybe just a "simple" NAME <-> PUBKEY database where HSes can register themselves in a FIFO fashion. This might cause tons of domain camping and attempts for dirty sybil attacks, but it might develop into something useful. Worst case we can shut it down and call the experiment done? AFAIK, I2P has been doing something similar at https://geti2p.net/en/docs/naming
Namecoin is playing around with a decentralized way of doing this. We'd be happy to work with Tor in this area.
Cheers, - -Jeremy Rand
On Oct 20, 2014, at 7:37 AM, George Kadianakis desnacked@riseup.net wrote:
d) There are various projects that are using HSes these days (TorChat, Pond, GlobaLeaks, Ricochet, etc.). We should think whether we want to support these use cases and how we can make their life easier. For example, Fabio has been asking for a way to spin up HSes using the control port (#5976). What other features do people want from the control port?
I’ve got a half-finished proposal for configuration of HS by controllers. I’ll try to finish it up soon.
I think it’s a compelling option even aside from the use case for Ricochet et al. For example, a HS operator could use a script to interactively decrypt and load his private key, instead of leaving it plainly on the filesystem.
h) Back to the community again. There have recently appeared a few messaging protocols that are inherently using HSes to provide link layer confidentiality and anonymity [1]. Examples include Pond, Ricochet and TorChat.
Some of these applications are creating one or more HSes per user, with the assumption that HSes are something easy to make and there is no problem in having lots of them. People are wondering how well these applications scale and whether they are using the Tor network the right way. See John Brooks' mail for a small analysis: https://moderncrypto.org/mail-archive/messaging/2014/000434.html
It might be worth researching these use cases to see how well Tor supports them and how they can be supported better (or whether they are a bad idea entirely).
I think the mail you meant to link was: https://moderncrypto.org/mail-archive/messaging/2014/000846.html
My intuition is that having a lot of low-usage hidden services isn’t difficult for the network. Introduction circuits aren’t a significant drain on resources, and posting 6 descriptors per hour isn’t a significant amount of traffic. I don’t have a good grasp on how expensive it is for the network to have an open circuit. If my vague assumptions hold, I don’t think it’s an issue to have many clients connected to many hidden services for low-throughput tasks, either.
The largest scalability issue here is checking whether a service is online. A successful hsdesc fetch reaches out to one directory, but an unsuccessful one will contact all six (each with its own new circuit). Stupid software - my own included - polling for connectivity causes a lot of traffic this way. I’d propose less-stupid software before changing tor, but lowering the directory mirrors from 6 to a more reasonable number would help.
It might be worth thinking about what Tor could do to better support that kind of “peer-to-peer” hidden service usage, but I don’t think it’s a scalability issue for now. For now, the bigger problem is in trying to scale to very busy hidden services, e.g. #13287.
== Opt-in HS indexing service ==
This seems like a fun project that can be used in various ways in the future. Of course, the feature must remain opt-in so that only services that want to be public will surface.
For this project, we could make some sort of 'HS authority' which collects HS information (the HS descriptor?) from volunteering HSes. It's unclear who will run an HS authority; maybe we can work with ahmia so that they integrate it in their infrastructure?
What is the benefit to automatically submitting descriptors, instead of the operator opting in by submitting their .onion address on ahmia’s website?
If we are more experimental, we can even build a basic petname system using the HS authority [2]. Maybe just a "simple" NAME <-> PUBKEY database where HSes can register themselves in a FIFO fashion. This might cause tons of domain camping and attempts for dirty sybil attacks, but it might develop into something useful. Worst case we can shut it down and call the experiment done? AFAIK, I2P has been doing something similar at https://geti2p.net/en/docs/naming
I think this would become a target for adversaries looking to shut down (or impersonate) a particular hidden service. Don’t give up self-authenticating hostnames easily; being the ‘registrar’ is a lot of risk and responsibility.
- #8243 Getting the HSDir flag should require more effort
- #2715 Is rephist-calculated uptime the right metric for HSDir assignment?
These could have an interesting effect on scalability. If we dramatically reduce the number of HSDirs, it might change my equation above on the costs of many low-usage hidden services.
== Epilogue ==
What useful projects/tickets did I forget here?
rend-spec-ng ;)
Which tasks from the above we should not do? I just went ahead and wrote down all the projects I could think of, with the idea that we will filter stuff later.
Thanks!
Thanks for summarizing all of this!
- John
[Removed tor-dev from cc]
On Mon, Oct 20, 2014 at 9:37 AM, George Kadianakis desnacked@riseup.net wrote:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
Hi! Since I can't make the declared meeting time tomorrow, I thought I should brain-dump now about more ideas that might fit under the categories you list. I do not guarantee that these are the best places for these ideas.
I'm also deliberately writing these *before* I go through the rest of your email, so that I'm hopefully generating new things.
I'm going to focus only on the subset of those categories that Roger/David told me are the most important for the sponsor. These are:
- Safe statistics collection
To make statistics collection safe, I'd argue that we really need something like the design from proposal 224 that de-correlates multiple instances of a single hidden service. That way, aggregate statistics are safer to maintain.
- Tor controller API improvements
People have wanted something in this area for a while, but I don't think i've seen any really solid and comprehensive proposals. I think there are two ways to go at it, and we should do them both at once:
* Get somebody who wants to access hidden services over the controller API to explain what they want to build. Then design an API as needed to support it.
* Look at what somebody might want to do with hidden services via the controller API; then design an API to expose that.
I'm fairly well equipped to design something in the second area -- so are a lot of people. But for the first, I think we could benefit a bit from some conversations with people who'd use such improvements.
- Performance improvements
Tons of stuff from proposal 224 (and a fair bit from proposal 220) fits in this area. Parallelizing hidden service crypto should help some, and there are performance improvements on the ed25519 side that should make it quite a bit more CPU efficient.
But in the area of getting visible performance improvements, I also think we need to be data-directed in terms of instrumenting where the time is actually going in a connection to an HS. We should also run a busy HS and profile it, and do profile-directed optimizations.
Nick Mathewson nickm@torproject.org writes:
On Mon, Oct 20, 2014 at 9:37 AM, George Kadianakis desnacked@riseup.net wrote:
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
I see that #11291 is not on that list. I think it should be.
- Get somebody who wants to access hidden services over the controller
API to explain what they want to build. Then design an API as needed to support it.
Currently, at least the way Tor is deployed on Debian, you cannot add a new hidden-service to a running Tor if you're "just" in the right group (in this case, debian-tor). #11291 would fix this, and is pretty close; dawuud has a branch that's got more utests etc (*not* asking for more review yet ;) )
- Look at what somebody might want to do with hidden services via the
controller API; then design an API to expose that.
One issue with hidden services and the overall controller API, is that they are the only "special" thing that has multi-line configuration where order matters. So controllers have to do "non-trivial" things to make hidden serivces "go".
Unfortunately, I don't have a concrete suggestion here, beyond "take a ground-up look at the controller API", which I'm guessing is out-of-scope? Basically, something more structured might make sense? *However* since hidden services are the only thing that actually makes order important (as far as I recall), perhaps re-thinking those alone within the existing framework would be much less disruptive *and* simplify controller logic (i.e. eliminate the "order is important" bit).
The only concrete use-case I can offer is carml's "pastebin" command, which would like to add and delete hidden services from a running Tor. Currently, it always launches a new Tor instance (so "add" is launch, and "delete" is kill). Perhaps this is the best way anway, separation-wise...?
I can imagine that adding the equivalent of add/delete for authorize-client lines would be a Good Thing, too.
Just brainstorming here, but could both the above be accomplished with some sort of "change configuration" command? That is, instead of forcing controllers to remember enough to make a SETCONF work, the opportunity to add or delete things exists? (And perhaps only for hidden services, since they're the only "special" things currently anyway?) This probably implies an ID for each hidden service...
This also would map fairly well to most UIs, which then just have to remember what the user did (e.g. "clicked delete on the 3rd line, then clicked add with options X, Y, and Z").
Any Twisted application written in a network endpoint agnostic manner may be used with the txtorcon hidden service endpoint... For instance serving files from a Tor hidden service can be done with Meejah's one-liner: pip install txtorcon && twistd -n web --port "onion:80" --path ~/public_html
However I see the current txtorcon design (without 12911 resolved) as lacking security isolation since tor is launched as the same user as the python process. Using the control port to create hidden services seems like the obviously better way to do this.
Currently Tahoe-LAFS is used with torsocks and manually configured Tor hidden services in order to hide the the identity of the tahoe client and server operators. We'd like for Tahoe-LAFS to have native Tor integration... Using the txtorcon endpoint would greatly simplify deployment for Tahoe-LAFS storage operators wishing to hide their identity/location.
David
On Tue, Oct 28, 2014 at 10:40 PM, meejah meejah@meejah.ca wrote:
Nick Mathewson nickm@torproject.org writes:
On Mon, Oct 20, 2014 at 9:37 AM, George Kadianakis desnacked@riseup.net wrote:
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
I see that #11291 is not on that list. I think it should be.
- Get somebody who wants to access hidden services over the controller
API to explain what they want to build. Then design an API as needed to support it.
Currently, at least the way Tor is deployed on Debian, you cannot add a new hidden-service to a running Tor if you're "just" in the right group (in this case, debian-tor). #11291 would fix this, and is pretty close; dawuud has a branch that's got more utests etc (*not* asking for more review yet ;) )
- Look at what somebody might want to do with hidden services via the
controller API; then design an API to expose that.
One issue with hidden services and the overall controller API, is that they are the only "special" thing that has multi-line configuration where order matters. So controllers have to do "non-trivial" things to make hidden serivces "go".
Unfortunately, I don't have a concrete suggestion here, beyond "take a ground-up look at the controller API", which I'm guessing is out-of-scope? Basically, something more structured might make sense? *However* since hidden services are the only thing that actually makes order important (as far as I recall), perhaps re-thinking those alone within the existing framework would be much less disruptive *and* simplify controller logic (i.e. eliminate the "order is important" bit).
The only concrete use-case I can offer is carml's "pastebin" command, which would like to add and delete hidden services from a running Tor. Currently, it always launches a new Tor instance (so "add" is launch, and "delete" is kill). Perhaps this is the best way anway, separation-wise...?
I can imagine that adding the equivalent of add/delete for authorize-client lines would be a Good Thing, too.
Just brainstorming here, but could both the above be accomplished with some sort of "change configuration" command? That is, instead of forcing controllers to remember enough to make a SETCONF work, the opportunity to add or delete things exists? (And perhaps only for hidden services, since they're the only "special" things currently anyway?) This probably implies an ID for each hidden service...
This also would map fairly well to most UIs, which then just have to remember what the user did (e.g. "clicked delete on the 3rd line, then clicked add with options X, Y, and Z").
-- meejah _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
correction... I meant #11291.
On Wed, Oct 29, 2014 at 1:04 AM, David Stainton dstainton415@gmail.com wrote:
Any Twisted application written in a network endpoint agnostic manner may be used with the txtorcon hidden service endpoint... For instance serving files from a Tor hidden service can be done with Meejah's one-liner: pip install txtorcon && twistd -n web --port "onion:80" --path ~/public_html
However I see the current txtorcon design (without 12911 resolved) as lacking security isolation since tor is launched as the same user as the python process. Using the control port to create hidden services seems like the obviously better way to do this.
Currently Tahoe-LAFS is used with torsocks and manually configured Tor hidden services in order to hide the the identity of the tahoe client and server operators. We'd like for Tahoe-LAFS to have native Tor integration... Using the txtorcon endpoint would greatly simplify deployment for Tahoe-LAFS storage operators wishing to hide their identity/location.
David
On Tue, Oct 28, 2014 at 10:40 PM, meejah meejah@meejah.ca wrote:
Nick Mathewson nickm@torproject.org writes:
On Mon, Oct 20, 2014 at 9:37 AM, George Kadianakis desnacked@riseup.net wrote:
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
I see that #11291 is not on that list. I think it should be.
- Get somebody who wants to access hidden services over the controller
API to explain what they want to build. Then design an API as needed to support it.
Currently, at least the way Tor is deployed on Debian, you cannot add a new hidden-service to a running Tor if you're "just" in the right group (in this case, debian-tor). #11291 would fix this, and is pretty close; dawuud has a branch that's got more utests etc (*not* asking for more review yet ;) )
- Look at what somebody might want to do with hidden services via the
controller API; then design an API to expose that.
One issue with hidden services and the overall controller API, is that they are the only "special" thing that has multi-line configuration where order matters. So controllers have to do "non-trivial" things to make hidden serivces "go".
Unfortunately, I don't have a concrete suggestion here, beyond "take a ground-up look at the controller API", which I'm guessing is out-of-scope? Basically, something more structured might make sense? *However* since hidden services are the only thing that actually makes order important (as far as I recall), perhaps re-thinking those alone within the existing framework would be much less disruptive *and* simplify controller logic (i.e. eliminate the "order is important" bit).
The only concrete use-case I can offer is carml's "pastebin" command, which would like to add and delete hidden services from a running Tor. Currently, it always launches a new Tor instance (so "add" is launch, and "delete" is kill). Perhaps this is the best way anway, separation-wise...?
I can imagine that adding the equivalent of add/delete for authorize-client lines would be a Good Thing, too.
Just brainstorming here, but could both the above be accomplished with some sort of "change configuration" command? That is, instead of forcing controllers to remember enough to make a SETCONF work, the opportunity to add or delete things exists? (And perhaps only for hidden services, since they're the only "special" things currently anyway?) This probably implies an ID for each hidden service...
This also would map fairly well to most UIs, which then just have to remember what the user did (e.g. "clicked delete on the 3rd line, then clicked add with options X, Y, and Z").
-- meejah _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Virgil Griffith i@virgil.gr writes:
- Opt-in HS indexing service
I offer to captain and lead development of this one.
Thanks for offering to help!
My main goal with this project would be to increase visibility of Hidden Services: make it easy for people to find Hidden Services that want to be found.
Search engines are very important for this, since they basically make the Internet easy and fast to navigate [0].
However, I see a few advantages of doing this directly on the Tor client instead of relying exclusively on HS search engines:
a) Currently, an HS operator that wants to get more visitors has to find an HS search engine and insert her HS in there. Or advertise in forums. Or hope that the HS gets noticed and linked from somewhere (and that existing HS search engines crawl links).
This future would allow the HS operator to add: PublicHiddenService 1 in her torrc, and automatically the HS would register itself somewhere and search engines would auto-learn about it [1].
b) By baking this feature in the Tor client, you can do digital signatures using the HS identity key which might allow secure naming systems to be built.
For example, you could send to the HS authority a signed name for your HS and a signed HS descriptor. And the HS authority could maintain a {petname : signed descriptor} map that would give assurance to clients that the name was actually chosen by the HS with that descriptor.
But to be honest, I haven't really thought about this topic and I don't believe strongly in my arguments above.
What I would do as the first step here would be to understand whether this idea has value. Maybe it's something that adds extra complexity, and HS operators should just do manually. To do that I think we should enumerate the various use cases and solutions that can be offered.
Use case examples: - HS Social network that wants to increase its userbase - IRC network that wants to increase its userbase - HS website that suffers from phishing and vanity key attacks. - ...
Notice that some use cases want visibility and other might want security. Can an Opt-In HS indexing service help them?
What solutions could be offered:
- An HS authority that archives HS names or descriptor. HS search engines and clients can look up descriptors. What's the threat model of the authority? Should it be hosted by Tor or not necessarily?
- An HS authority that facilitates some sort of petname scheme. But with what interface? A TBB plugin? How are the I2P guys doing it?
- Output a file in DataDirectory that people are supposed to submit to an HS authority if they want.
- A GNS setup that offers secure/decentralized/human-memorable naming system. But what to do with all those zones and master zones and stuff? I don't know how to make that usable (both for clients and HS operators).
- Maybe none of these things should happen, and this is entirely a bad idea that adds more code to Tor, has dangerous misconfiguration consequences, has dangerous phishing potential and doesn't really add any value.
- More ideas.
This is more of a braindump, but a more structured response would need to wait many days, so release early release often :)
Let me know if you find this interesting and what are your thoughts :)
[0]: See https://moderncrypto.org/mail-archive/messaging/2014/000944.html for an analysis on why people use search engines instead of the address bar.
[1]: Let's leave bikeshedding about the name of the torrc option and how alarmist it should be for later.
Thanks for offering to help!
My main goal with this project would be to increase visibility of Hidden Services: make it easy for people to find Hidden Services that want to be found.
Search engines are very important for this, since they basically make the Internet easy and fast to navigate [0].
Give me posted on the funding. Assuming SponsorR comes through, I can write-up the details for the search engine. Give me a mechanism that provides a list of HSs to be indexed, and I will ensure that the UX to search them is pleasant.
-V
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 20/10/14 14:37, George Kadianakis wrote:
On an even more researchy tone, Qingping Hou et al wrote a proposal to reduce the length of HS circuits to 5 hops (down from 6). You can find their proposal here: https://lists.torproject.org/pipermail/tor-dev/2014-February/006198.html
The project is crazy and dangerous and needs lots of analysis, but it's something worth considering. Maybe this is a good time to do this analysis?
One aspect of this proposal that might be problematic: the client and hidden service negotiate a random number and use it to pick a rendezvous point from a list of candidates. They must have matching lists of candidates.
With a similar idea in mind, I recently looked into how long it takes for two clients to obtain copies of the same consensus. I found out that this is never guaranteed to happen, because each client may skip a consensus each time it downloads a fresh one. That would need to be addressed before implementing the 5-hop proposal.
https://lists.torproject.org/pipermail/tor-dev/2014-September/007571.html
Cheers, Michael
On Tue, Oct 21, 2014 at 8:23 AM, Michael Rogers michael@briarproject.org wrote:
On 20/10/14 14:37, George Kadianakis wrote:
On an even more researchy tone, Qingping Hou et al wrote a proposal to reduce the length of HS circuits to 5 hops (down from 6). You can find their proposal here: https://lists.torproject.org/pipermail/tor-dev/2014-February/006198.html
The project is crazy and dangerous and needs lots of analysis, but it's something worth considering. Maybe this is a good time to do this analysis?
One aspect of this proposal that might be problematic: the client and hidden service negotiate a random number and use it to pick a rendezvous point from a list of candidates. They must have matching lists of candidates.
With a similar idea in mind, I recently looked into how long it takes for two clients to obtain copies of the same consensus. I found out that this is never guaranteed to happen, because each client may skip a consensus each time it downloads a fresh one. That would need to be addressed before implementing the 5-hop proposal.
Yeah, Qingping assumed that clients did converge on the same consensus, and handwaved over the "and they prove to each other they are using the same consensus" part. I agree that needs to get fixed first.
Qingping graduated and my group has a lot of other projects to juggle right now, but we are in principle still interested in pursuing that to some sort of definite conclusion. Funding of course helps free up student time :)
zw
On Mon, Oct 20, 2014 at 02:37:49PM +0100, George Kadianakis wrote:
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
Thanks for getting this going!
== Safe statistics collection ==
We've discussed this quite a bit over the past year and I think we all pretty much agree on which stats are safe to collect and which not.
I think we know some safe first-round things to collect. But I bet there's a lot more that's grey-area.
I think we all agree that collecting the number of HS circuits and traffic volume from RPs (#13192) is harmless [0] and useful information to have. We need to clean up Roger's patch to add that information in extra-info descriptors, and then do some visualisations. That would give us a good idea of how much HSes are used.
Sounds good.
OTOH, other statistics like "# of HS descriptors" are not that harmless and the upcoming HS redesign will block us from getting this information anyway.
How will the upcoming design block hsdirs from knowing how many descriptors they got? I agree that it'll be harder to know how many underlying hidden services they represent; but given a predictable publishing schedule and a number of descriptors you got, you should be able to derive an estimate for how many hidden services published to you.
The main goal of (that part of) the hsdesc redesign was to prevent hsdirs from learning how to visit the hidden service.
Unless we have a plan for the security property you describe?
For now, I think we should focus on #13192 for this project.
I'd like to branch out into a bunch of other stats too -- for example, how many intro points were established at my relay over the past 24 hours? How many rendezvous points? How many of those rendezvous points got used? What were the various usage stats of the intro points?
Basically, anything where we're pretty sure it can't be used for harm, but we can imagine finding numbers that surprise us, we should consider looking at, to see if we're surprised -- and we should collect the numbers over time, in case later they turn into something surprising.
The trickier examples, which we shouldn't deploy without more thought, would be things like "what is the median, 90th percentile, etc of bytes used on circuits where I'm the rendezvous point"? At the extreme, which is clearly in "don't do it" territory, we can imagine doing a website fingerprinting attack at the rendezvous point, to discover which hidden service it is and thus track popularity. See item #4 at the end of this mail for more next steps.
== Tor controller API improvements ==
To better refine this project, we should think about what we want to get out of it. Here are some outcomes:
a) A better control API allows us to perform better performance measurements for HSes.
All in all, this seems like a project worth doing right because it will be useful in the future. It can even act as an automated regression test.
I agree. I'd like us to get a really good answer in place here and then see what it tells us over time.
b) This might also be a good time to start working on automated integration tests for HSes.
It should be possible to spin up private Chutney networks and test that particular HSes are reachable. Or perform regression tests; for example, Roger recently suggested writing a regression test to make sure that clocks don't need to be synchronized to build HS circuits (#13494).
Yep. In general, I think there are many questions where we look at the code and wonder if it's behaving correctly. With a tool like Chutney, we should be able to instrument the whole network to *verify* that the code is behaving correctly. And with a bit more effort, we should be able to make that verification easier to do the second time.
Part of what I'm hoping we'll uncover with the Chutney tests is edge cases where most of the time it behaves as we expect, but every so often it goes really wrong. So we should think about what we want to capture from each relay in order to be able to track down the details of these anomalies.
And then we should figure out how to induce defects, like half of the hsdirs for the hidden service go down or drop the descriptor, and see if our algorithms still behave the way we expect.
Bonus points if it's pretty easy to reuse these steps on both Chutney and Shadow.
This work ties into https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorS/Integrat... so hopefully it will be a great area where SponsorR devs can collaborate with SponsorS devs, and where SponsorR devs can get up-to-speed on all our hidden service code.
c) Tor should better expose error messages of failed operations. For example, this could allow TBB to inform users whether they mistyped the onion address or the HS is actually down, and it would also let us do #13208. Proposal 229 and ticket #13212 are related to this. We should see whether the PT team is planning to implement proposal 229 and how we can synchronise.
Sounds great.
d) There are various projects that are using HSes these days (TorChat, Pond, GlobaLeaks, Ricochet, etc.). We should think whether we want to support these use cases and how we can make their life easier. For example, Fabio has been asking for a way to spin up HSes using the control port (#5976). What other features do people want from the control port?
Sounds great.
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better? IIUC, we are mainly interested in client-side performance, but if a change makes both sides faster that's even better.
Remember that just as much as improving performance, we also want to reduce variance of performance.
So while many of the things you list are "I bet we could improve the design so in theory it should work better", we shouldn't skip over the "I wonder if we're actually doing the thing we think we are" questions, as well as the "ok yes we're doing it, but does it actually produce the results we think it does" questions.
It's pretty easy to justify refactoring of the code in the process of answering questions like these (which will hopefully pave the way for future design changes).
Some projects:
a) Looking at Karsten's #1944 results http://ec2-54-92-231-52.compute-1.amazonaws.com/ we see that fetching HS descriptors takes much more time than it should. I wonder why this is the case. Is there another ntohl bug there?
We should perform measurements and get a good understanding of what's going on in this step. Here are some tickets that Roger opened to do exactly that:
- #13208 What's the average number of hsdir fetches before we get the hsdesc?
- #13209 Write a hidden service hsdir health measurer
And here is a ticket with a potential issue:
- #13207 Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?
Yep.
b) Improving the other parts of the circuit establishment process is also important:
- #8239 Hidden services should try harder to reuse their old intro points
- #3733 Tor should abandon rendezvous circuits that cause a client request to time out
- #13222 Clients accessing a hidden service can establish their rend point in parallel to fetching the hsdesc
Furthermore, an area of Tor that might give us better performance but we haven't really explored yet is preemptive circuits. #13239 is about building more internal circuits for HSes.
And here is a ticket suggesting more measurements:
- #13194 Track time between ESTABLISH_RENDEZVOUS and RENDEZVOUS1 cell
Yep.
c) Another important project in this area is parallelizing HS crypto. I haven't looked at what this would actually entail, but it will probably involve implementing the undone parts of proposal 220/224.
We should indeed ponder how useful we think this would be, but it will be hard to impress the funder with this one. That is, they won't object if we do this on our own time, but it isn't something I will easily be able to say "and look what else we did for you!" and get a good reaction.
d) This might be the time to implement Encrypted Services? Many people have been asking for this feature and this might be the right time to do it: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-enc...
Alas, this one also counts as "new design and development on hidden services", which isn't really in-scope. We can do it if we think it's important, but there are many things in that category.
e) Following the trail of #13207, we should look at all the magic numbers currently used by HSes, document them and see if they make sense. This includes the number of IPs (#8950), the number of HSDirs/replicas, the intro point expiration date, etc.
Yes!
f) On a more researchy tone, this might also be a good point to start poking at the HS scalability project since it will really affect HS performance.
We should look at Christopher Baines' ideas and write a Tor proposal out of them: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html Last time I looked, Christopher's ideas required implementing proposal225 and #8239.
Sounds great. I think #8902 falls squarely in this area too.
h) Back to the community again. There have recently appeared a few messaging protocols that are inherently using HSes to provide link layer confidentiality and anonymity [1]. Examples include Pond, Ricochet and TorChat.
Some of these applications are creating one or more HSes per user, with the assumption that HSes are something easy to make and there is no problem in having lots of them. People are wondering how well these applications scale and whether they are using the Tor network the right way. See John Brooks' mail for a small analysis: https://moderncrypto.org/mail-archive/messaging/2014/000434.html
It might be worth researching these use cases to see how well Tor supports them and how they can be supported better (or whether they are a bad idea entirely).
Yes. My guess is that it's lightweight to establish a circuit with each of your friends, and then when it goes away you try to reestablish it and if you fail then your friend is probably gone. And my guess is that it's heavyweight to try rendezvousing with each of your friends every 5 minutes to see if they're still there.
We should put up some guidelines for eco-friendly use of hidden services in this situation.
== Opt-in HS indexing service ==
For this project, we could make some sort of 'HS authority' which collects HS information (the HS descriptor?) from volunteering HSes. It's unclear who will run an HS authority; maybe we can work with ahmia so that they integrate it in their infrastructure?
The question of whether this has to be built-in is a fine one to explore. I bet we'd get more people doing it if it were just a torrc option that you can uncomment. But it also seems inherently less safe, since it might mean more publishings by your Tor than the human would do.
== Security / Miscellaneous ==
I also noticed that some tickets on trac were assigned to SponsorR but I couldn't fit them in the above categories. They are mainly security enhancements or code improvements. Here is a dump of the tickets:
Security:
- #13214 HS clients don't validate descriptor-id returned by HSDir
- #7803 Clients shouldn't send timestamps in INTRODUCE1 cells
- #8243 Getting the HSDir flag should require more effort
- #2715 Is rephist-calculated uptime the right metric for HSDir assignment?
Miscellaneous:
- #13223 Refactor rend_client_refetch_v2_renddesc()
- #13287 Investigate mysterious 24-hour lump in hsdir desc fetches
- #8902 Rumors that hidden services have trouble scaling to 100 concurrent connections
== Epilogue ==
What useful projects/tickets did I forget here?
1) We should identify and describe the great use cases of hidden services, especially the ones that are not of the form "I want to run a website that the man wants to shut down."
What sorts of hidden service examples are we missing from the world that we'd really like to see, and that would help everybody understand the value and flexibility of hidden services?
Along these lines would be fleshing out the "hidden service challenge" idea I've been kicking around, where as a follow-up to the EFF relay challenge, we challenge everybody to set up a novel hidden service. We would somehow need to make it so people didn't just stick their current website behind a hidden service -- or maybe that would be an excellent outcome?
2) Early results indicate that there are 25k-50k hidden services running currently; whereas Ahmia and friends know about maybe 1400 of them. So there is a lot of, shall we call it, dark matter in hidden service space. What are some safe ways we can improve our knowledge of this other 95% of the space?
3) One of the things special said was really good: a successful hidden service rendezvous is pretty lightweight, but an unsuccessful one really spends a lot of time and energy failing to fetch stuff before it fails. Can we make unsuccessful rendezvous connections less of a burden on the network, without interfering (much) with successful ones?
4) On the "whether it's safe to collect that measurement" front, it is in-scope to look at the science of how to decide whether it's safe. For example, the simplified view I've been taking is "if you collect a data set, and then you imagine somewhere out there is somebody else with some other arbitrary data set, does publishing your data set help that other person learn things about Tor users that you wish they couldn't learn?" If there's a way to formalize this question, or tie it into differential privacy, or otherwise move us beyond "think real hard and then if you don't see any problems do it", that would be great.
5) If you have a list of thousands of hidden service websites, and you want to fetch the pages from each of them, what parameters should you set on your fetcher to not take too much time, but also not trigger scalability problems in Tor that lead to wrong results? It seems that you'll want to do the fetches in a depth first way rather than breadth first, so you can make use of the established circuit? What other advice is useful here? Are there easy fixes we can make in Tor to allow more aggressive parameters to work? Will the tor2web patches help, and can we make better patches?
6) In general, anything that falls under the umbrella of "better understanding hidden services and their role in society" is fair game here. So far we've mostly emphasized the technical part of understanding them, which makes sense because we're mostly a technical organization. But we should think about whether there are steps we can take on the social side. And I think our funder will be sympathetic to "oh and we took these steps to improve the chance that hidden services will be used for good" too.
In other news, I plan at some point to write up a blog post explaining who the funder is and what exactly we're doing (and not doing!) for them. A few more things have to fall into place first though.
Thanks! --Roger
Roger Dingledine wrote:
h) Back to the community again. There have recently appeared a few messaging protocols that are inherently using HSes to provide link layer confidentiality and anonymity [1]. Examples include Pond, Ricochet and TorChat.
There are also a fair few IRC and XMPP servers floating around onionland (and soon to be many more via Stormy). I'm also really curious what the impact that Pond would have on the HS landscape if it become popular. Right now, there are probably only a handful of people who run their own independent Pond HS, but that could change.
There's also onionshare, which creates hidden services as-needed -- which are typically discarded after sharing a single file one time.
It might be worth researching these use cases to see how well Tor supports them and how they can be supported better (or whether they are a bad idea entirely).
Yes. My guess is that it's lightweight to establish a circuit with each of your friends, and then when it goes away you try to reestablish it and if you fail then your friend is probably gone. And my guess is that it's heavyweight to try rendezvousing with each of your friends every 5 minutes to see if they're still there.
We should put up some guidelines for eco-friendly use of hidden services in this situation.
Scott Ainslie and I came to the conclusion that two one-way video conversations over hidden services is a pretty decent replacement for Skype etc[2]. At a really crude level, this can be achieved using gstreamer (maybe with FreeNote[1]) and then sharing the hidden service addresses with each other. Some assembly required, obviously. It's my undying wish that someone create a proof-of-concept app for this using gtk or kivy or something.
== Opt-in HS indexing service ==
The question of whether this has to be built-in is a fine one to explore. I bet we'd get more people doing it if it were just a torrc option that you can uncomment. But it also seems inherently less safe, since it might mean more publishings by your Tor than the human would do.
It would definitely get more opt-ins than if there were additional steps. There's a measure of informed consent there, because if you are opting in intentionally, then you are saying that you want your hidden service publicized. Any given person running a library or art project might think "Oh nobody cares about my hidden service" and not bother going through additional steps, but would be perfectly happy to have more people look at their work.
The question, to me, is how to frame the torrc option so as to make sure people know it's optional.
- #8902 Rumors that hidden services have trouble scaling to 100
concurrent connections
I've been curious about this ticket for a while, and happy to structure&run a follow-up test on a controlled server. Since the original problem was with an IRC server, it makes sense to set one up for the purposes of a test, and then set up a secondary machine for 'user' connections and an extra monitoring point.
I suspect that there are other factors that might have influenced that report. Could it be an issue with one of the intermediary points? There certainly *seem* to be tons of people using the OFTC hidden service, but that could be perception (ie, still <100 concurrent users).
What useful projects/tickets did I forget here?
- We should identify and describe the great use cases of hidden
services, especially the ones that are not of the form "I want to run a website that the man wants to shut down."
One thing that is interesting: in practice, onionshare (RetroShare et al) winds up being easier than trying to share a file with a friend using third-party services. Particularly for large-ish files or something where you want some measure of privacy (ohai dropbox), sending it to a third-party and then making it available to your friend and then deleting/hiding it again is a little annoying. (And there are of course privacy and cost tradeoffs with this as well).
People like to set up private IRC & Jabber chats to chat without attracting trolls and spambots, and get an extra layer of encryption from Tor.
What sorts of hidden service examples are we missing from the world that we'd really like to see, and that would help everybody understand the value and flexibility of hidden services?
Along these lines would be fleshing out the "hidden service challenge" idea I've been kicking around, where as a follow-up to the EFF relay challenge, we challenge everybody to set up a novel hidden service. We would somehow need to make it so people didn't just stick their current website behind a hidden service -- or maybe that would be an excellent outcome?
This could be fun. =) We could put out a blog post when Stormy reaches 1.0 about this too.
there is a lot of, shall we call it, dark matter in hidden service space. What are some safe ways we can improve our knowledge of this other 95% of the space?
:3 http://i.imgur.com/5pXuSFf.png
- In general, anything that falls under the umbrella of "better
understanding hidden services and their role in society" is fair game here. So far we've mostly emphasized the technical part of understanding them, which makes sense because we're mostly a technical organization. But we should think about whether there are steps we can take on the social side. And I think our funder will be sympathetic to "oh and we took these steps to improve the chance that hidden services will be used for good" too.
In other news, I plan at some point to write up a blog post explaining who the funder is and what exactly we're doing (and not doing!) for them. A few more things have to fall into place first though.
I'd be happy to work on this more as well =) There are some good ways to discuss hidden services -- even outside of the easier pitches like whistleblower protection, hidden services are really awesome and need more positive attention from the outside non-hardcore-nerd world.
best, Griffin
== Such References ==
[1] https://github.com/ioerror/freenote [2] Where'd he run off to?
Griffin Boyce griffin@cryptolab.net writes:
Roger Dingledine wrote:
<snip>
- #8902 Rumors that hidden services have trouble scaling to 100
concurrent connections
I've been curious about this ticket for a while, and happy to structure&run a follow-up test on a controlled server. Since the original problem was with an IRC server, it makes sense to set one up for the purposes of a test, and then set up a secondary machine for user' connections and an extra monitoring point.
Yes, someone testing this theory would be awesome!
I would be surprised if 100 connections is the _exact_ number where HSes starts dying. However, I'd totally believe that there might be issues causing HSes to get more unreliable after some load. We should find these issues!
I suspect that there are other factors that might have influenced that report. Could it be an issue with one of the intermediary points? There certainly *seem* to be tons of people using the OFTC hidden service, but that could be perception (ie, still <100 concurrent users).
<snip>
What sorts of hidden service examples are we missing from the world that we'd really like to see, and that would help everybody understand the value and flexibility of hidden services?
Along these lines would be fleshing out the "hidden service challenge" idea I've been kicking around, where as a follow-up to the EFF relay challenge, we challenge everybody to set up a novel hidden service. We would somehow need to make it so people didn't just stick their current website behind a hidden service -- or maybe that would be an excellent outcome?
This could be fun. =) We could put out a blog post when Stormy reaches 1.0 about this too.
Ah Stormy!
I was a fan of the APAF project [0] and Stormy seems to be its successor. I liked APAF because it would be the LAMP of Hidden Services: it would make them easier to setup and configure.
For this reason, I think Stormy fits very well with Roger's hopes of "improving" the role of HSes in society.
I'm excited to learn what you've been cooking in this front.
Is there a document that describes what Stormy aims to do? It would be great if such a design document existed even if Stormy is not at 1.0 yet :) The document doesn't need to be big or detailed, but it would be great if we could learn what Stormy is about.
[0]: https://gitweb.torproject.org/apaf.git/blob/HEAD:/spec.txt
Is the stormy code available anywhere yet?
There seem to be no commits in the stormy repo (none in user/griffin/stormy.git either). A cursory web search didn't help either.
I'd love to review/test/contribute.
On November 4, 2014 8:18:49 AM EST, George Kadianakis desnacked@riseup.net wrote:
Griffin Boyce griffin@cryptolab.net writes:
Roger Dingledine wrote:
<snip>
- #8902 Rumors that hidden services have trouble scaling to 100
concurrent connections
I've been curious about this ticket for a while, and happy to structure&run a follow-up test on a controlled server. Since the original problem was with an IRC server, it makes sense to set one up for the purposes of a test, and then set up a secondary machine for user' connections and an extra monitoring point.
Yes, someone testing this theory would be awesome!
I would be surprised if 100 connections is the _exact_ number where HSes starts dying. However, I'd totally believe that there might be issues causing HSes to get more unreliable after some load. We should find these issues!
I suspect that there are other factors that might have influenced that report. Could it be an issue with one of the intermediary points? There certainly *seem* to be tons of people using the OFTC hidden service, but that could be perception (ie, still <100 concurrent users).
<snip>
What sorts of hidden service examples are we missing from the world that we'd really like to see, and that would help everybody understand
the
value and flexibility of hidden services?
Along these lines would be fleshing out the "hidden service
challenge"
idea I've been kicking around, where as a follow-up to the EFF relay challenge, we challenge everybody to set up a novel hidden service.
We
would somehow need to make it so people didn't just stick their
current
website behind a hidden service -- or maybe that would be an
excellent
outcome?
This could be fun. =) We could put out a blog post when Stormy reaches 1.0 about this too.
Ah Stormy!
I was a fan of the APAF project [0] and Stormy seems to be its successor. I liked APAF because it would be the LAMP of Hidden Services: it would make them easier to setup and configure.
For this reason, I think Stormy fits very well with Roger's hopes of "improving" the role of HSes in society.
I'm excited to learn what you've been cooking in this front.
Is there a document that describes what Stormy aims to do? It would be great if such a design document existed even if Stormy is not at 1.0 yet :) The document doesn't need to be big or detailed, but it would be great if we could learn what Stormy is about.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 20/10/14 14:37, George Kadianakis wrote:
f) On a more researchy tone, this might also be a good point to start poking at the HS scalability project since it will really affect HS performance.
We should look at Christopher Baines' ideas and write a Tor proposal out of them: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html Last time I looked, Christopher's ideas required implementing proposal225 and #8239.
I am still around, and interested in helping out with this! :)
Christopher Baines cbaines8@gmail.com writes:
On 20/10/14 14:37, George Kadianakis wrote:
f) On a more researchy tone, this might also be a good point to start poking at the HS scalability project since it will really affect HS performance.
We should look at Christopher Baines' ideas and write a Tor proposal out of them: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html Last time I looked, Christopher's ideas required implementing proposal225 and #8239.
I am still around, and interested in helping out with this! :)
Hello,
glad to see you are still interested in this.
I think the first step here is to write a document with the various solutions, their requirements, and how they influence the threat model.
First of all, what do we mean by scalability and what properties are we trying to offer?
On concrete schemes now, what features are needed to implement your scheme [0]? Can these requirements be changed? For example, on the topic of selecting IPs, can we dump the requirement for global randomness, by using the long-term private key as a seed to picking IPs?
Also, what's the threat model of your scheme? What more information do the IPs learn? What more information do the clients learn?
What other ways are there to do HS scalability? How does their threat model change [1]? etc.
By the way, is your thesis somewhere public? I imagine that it tackles a few of these questions already.
Thanks!
[0]: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html [1]: https://lists.torproject.org/pipermail/tor-dev/2013-October/005683.html
On 29/10/14 13:01, George Kadianakis wrote:
Christopher Baines cbaines8@gmail.com writes:
On 20/10/14 14:37, George Kadianakis wrote:
f) On a more researchy tone, this might also be a good point to start poking at the HS scalability project since it will really affect HS performance.
We should look at Christopher Baines' ideas and write a Tor proposal out of them: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html Last time I looked, Christopher's ideas required implementing proposal225 and #8239.
I am still around, and interested in helping out with this! :)
Hello,
glad to see you are still interested in this.
Apologies for the very late reply. My project report is now available here, it probably covers some stuff that is relevant to the project, and lots of stuff that is not relevant [1].
1: http://cbaines.net/projects/tor/disths/report.pdf
I think the first step here is to write a document with the various solutions, their requirements, and how they influence the threat model.
I do this in the report, without much depth though.
First of all, what do we mean by scalability and what properties are we trying to offer?
So, I looked at distribution (removing bottlenecks), but never showed that this was a problem for scalability, or that anything I did improved the scalability.
With that in mind, here are the goals I looked at.
Primary Goals
Allow for the distribution of connections to a hidden service. This is the core goal, as achieving this would allow for the horizontal scaling (scaling through adding nodes to the system) of hidden services, by distributing the load across multiple machines.
A service must be accessible if one or more instances are in operation. In conjunction with the previous goal, this provides a means for achieving increased availability, the service can be hosted from multiple geographical locations, reducing the probability of all of the service instances going oine (e.g. due to power or network outage).
Secondary Goals
Obscure the number of hidden service instances. The number of instances needs only to be known to the hidden service operator.
Obscure the state of each service instance. The state (operational or not) service instance can help to compromise the anonymity offered by a hidden service if available.
On concrete schemes now, what features are needed to implement your scheme [0]? Can these requirements be changed? For example, on the topic of selecting IPs, can we dump the requirement for global randomness, by using the long-term private key as a seed to picking IPs?
This is covered in the report. In short: - Remove the restriction on connections to an Introduction Point - Remove Introduction Point Specific Keys - Use one to many circuits when passing Introduction Requests - Bootstrap Service via Initial Descriptor Check - Introduction Point Instance Selection - Selecting New Introduction Points - Introduction Point Reconnection
All of these things I implemented (to some degree) for testing (disths branch of git@git.cbaines.net:tor).
As for the global randomness issue, I see no reason why using the private key would not work, but don't know enough to say it will.
Also, what's the threat model of your scheme? What more information do the IPs learn? What more information do the clients learn?
In the way that I implemented it, introduction points are able to determine the state and number of hidden service instances.
I think this could be helped if each service instance connected to each introduction point via 1 to n circuits, each circuit looks identical, so this means that instead of knowing the exact number, you just know that it is <= n. By closing these circuits as part of the normal operation of the hidden service, I think you could also mask instance failure.
This approach might however make it easier to locate instances, due to multiple circuits being used.
What other ways are there to do HS scalability? How does their threat model change [1]? etc.
I cover a few ways of doing distribution in the report.
By the way, is your thesis somewhere public? I imagine that it tackles a few of these questions already.
See above.
Hi all,
NRL is effectively partnered with the Tor Project Inc. for the SponsorR efforts. Our (NRL's) tasking is largely overlapping and somewhat complementary to that of TPI. As such I thought it would be good to mention the basics of what we are working on to better inform and coordinate the planning George et al. have begun discussing in this thread.
Our task are
1. to identify which statistics about hidden services can be collected and reported without harming user security.
This is also directly part of TPI's tasking, and I expect we will be collaborating on this directly. We will be working on this probably starting in c. a month.
2. to develop passive measurement techniques to measure information about hidden services. This would, for example, allow the collection of information about the relative popularity of different types of hidden services, for example what fraction of hidden service connections are for highly interactive connections vs. large data downloads vs. etc. Also developing techniques to infer global activity from local observations.
Some of this has already begun. Roger deployed a month ago on a few relays testing to see if a connection was for HSes vs. something else. And we did some initial analysis on the global projection based on estimation of how much bandwidth those relays saw, which varied wildly, although there are lots of potential explanations for that. Roger has also already in this thread touched on some statistics that are interesting but require thought before deciding how/if to collect them.
A primary focus of NRL's work between now and the end of the year has been and will be on devising a secure and accurate relay bandwidth measurement scheme, with an emphasis on something that should be much better than what is now available but also practical and compatible enough that it could be rolled out in Tor w/in c. a year (and we'll also be considering designs that are less directly implementable but more theoretically solid). This is one of Tor's biggest current vulnerabilities. It is pretty easy to get fake inflated BW numbers so as to have a consensus weight that allows you to observe amounts of traffic quite disproportionate to the amount you have actually been carrying in the past. There have been many published attacks based on bandwidth inflation, and Tor's current torflow design was not intended to be secure---and could use some accuracy attention as well. This also becomes important in the context of gathering HS statistics. If we are going to be deploying statistic gathering code in a way that is safe for users and hidden services, it is not enough to say what statistics are safe to honestly collect. We also need to make Tor's system of data gathering for those statistics robust to abuse. And one of the easiest ways to abuse statistics gathering to undermine user and service security is to manipulate BW attribution to increase the raw data is available to malicious entities. Of course any statistics that rely on accurate BW measurement will benefit from this work as well.
3. Designing and testing HS performance improvements, particularly as they affect the crawling and measuring activities on HSes that SponsorR is interested in.
Again we expect lots of collaboration in this area, although our focus will be on the above first.
4. Evaluate planned and future changes to HSes for security and performance, particularly to see how intended SponsorR measuring, crawling, and indexing techniques for HSes may be affected. For example, a technique that assumed directories could know when a new HS is listed would be affected by design changes in proposal 224.
Same comment as for task 3.
George Kadianakis desnacked@riseup.net writes:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
FWIW, I skimmed the thread and collected all the tasks that were proposed: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorRtasklist
I might have missed one or two things, so please append anything I missed to the wiki. Also, I haven't really splitted these into deliverables so some tasks from that list might actually be very abstract or very hard to do.
The idea is that in the future we will add more stuff to the list and then filter it to decide the future deliverables of SponsorR.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 11/25/2014 06:19 AM, George Kadianakis wrote:
George Kadianakis desnacked@riseup.net writes:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
FWIW, I skimmed the thread and collected all the tasks that were proposed: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorRtasklist
I might have missed one or two things, so please append anything I missed to the wiki. Also, I haven't really splitted these into deliverables so some tasks from that list might actually be very abstract or very hard to do.
The idea is that in the future we will add more stuff to the list and then filter it to decide the future deliverables of SponsorR.
Hi,
Is it worth mentioning Namecoin under section 4, since other naming systems are mentioned?
Cheers, - -Jeremy Rand
Jeremy Rand biolizard89@gmail.com writes:
On 11/25/2014 06:19 AM, George Kadianakis wrote:
George Kadianakis desnacked@riseup.net writes:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
FWIW, I skimmed the thread and collected all the tasks that were proposed: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorRtasklist
I might have missed one or two things, so please append anything I missed to the wiki. Also, I haven't really splitted these into deliverables so some tasks from that list might actually be very abstract or very hard to do.
The idea is that in the future we will add more stuff to the list and then filter it to decide the future deliverables of SponsorR.
Hi,
Is it worth mentioning Namecoin under section 4, since other naming systems are mentioned?
Cheers, -Jeremy Rand
Added. Thanks.
On 10/20/14 3:37 PM, George Kadianakis wrote:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
[snip]
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better? IIUC, we are mainly interested in client-side performance, but if a change makes both sides faster that's even better.
I suggest to consider also the so-called Tor2webMode to became a standard part of Tor as a way to improve Tor Hidden Services.
While Tor2web Mode has born with the goal to reduce the number of hops for a Tor client used together with Tor2web software, it can provide great benefit also for TorHS owner.
A TorHS owner MAY wish to be hidden in their location or not.
If a TorHS owner enable Tor2web Mode, then it's assumed that he don't want "location anonymity" while preserving all other properties of TorHS (link-level encryption, self-authenticating URI, etc).
With latest improvements of #12844 the performance of Tor2web Mode will be even better.
For TorHS like Facebook or other resources that *does not need* location anonymity, having shorter circuit is a great performance improvement either in latency either in bandwidth.
I would suggest/consider to introduce Tor2web mode (or something called differently) to be usable on stock Tor software, to enable quick optimization of TorHS owner that need performance by scarifying location anonymity .
Fabio Pietrosanti - lists lists@infosecurity.ch writes:
On 10/20/14 3:37 PM, George Kadianakis wrote:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
[snip]
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better? IIUC, we are mainly interested in client-side performance, but if a change makes both sides faster that's even better.
I suggest to consider also the so-called Tor2webMode to became a standard part of Tor as a way to improve Tor Hidden Services.
While Tor2web Mode has born with the goal to reduce the number of hops for a Tor client used together with Tor2web software, it can provide great benefit also for TorHS owner.
A TorHS owner MAY wish to be hidden in their location or not.
If a TorHS owner enable Tor2web Mode, then it's assumed that he don't want "location anonymity" while preserving all other properties of TorHS (link-level encryption, self-authenticating URI, etc).
With latest improvements of #12844 the performance of Tor2web Mode will be even better.
For TorHS like Facebook or other resources that *does not need* location anonymity, having shorter circuit is a great performance improvement either in latency either in bandwidth.
I would suggest/consider to introduce Tor2web mode (or something called differently) to be usable on stock Tor software, to enable quick optimization of TorHS owner that need performance by scarifying location anonymity .
I fully agree that a server-side equivalent of Tor2web mode should be made. The closest design we have so far is Roger's "encrypted services" proposal: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-enc...
Before implementation, the proposal needs some polishing and we should think of any further optimizations that can be done (e.g. the IP-equivalent of #12844 or something?). Implementation is not super hard, but not trivial either. It will be great if we could do this as part of SponsorR.
Then, HSes with the Facebook threat model (who don't care about location privacy) would be able to use this mode so that they are faster and also cause less traffic to the network.
I agree as well and have discussed this use case with NRL colleagues for a while now. Some thoughts that we have had: 1. “Encrypted service” is a terrible name because it sounds like its only providing encryption and also is too generic. The idea needs a name that communicates that it is different from location-hidden services. Some ideas: “Tor-only service”, “Tor-aware service”, “Tor client-protected service”, and all of the preceding with “onion” in place of “Tor”. I’ll use “Tor-only service”. 2. Providing a Tor-only service would split the current anonymity set of hidden services to anybody that can distinguish such connections. Using timing and packet counting, this should be possible for any relay on the path between client and service, including especially the client’s guard. 3. A Tor-only service could actually use the fact that it’s location is not hidden for good: it could choose to place servers in strategic locations so that users could pick the ones that put them an minimum risk for traffic correlation, for example, by placing servers in a location with good rule of law and data privacy regulations.
Cheers, Aaron
On Nov 25, 2014, at 5:43 PM, George Kadianakis desnacked@riseup.net wrote:
Fabio Pietrosanti - lists lists@infosecurity.ch writes:
On 10/20/14 3:37 PM, George Kadianakis wrote:
Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
[snip]
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better? IIUC, we are mainly interested in client-side performance, but if a change makes both sides faster that's even better.
I suggest to consider also the so-called Tor2webMode to became a standard part of Tor as a way to improve Tor Hidden Services.
While Tor2web Mode has born with the goal to reduce the number of hops for a Tor client used together with Tor2web software, it can provide great benefit also for TorHS owner.
A TorHS owner MAY wish to be hidden in their location or not.
If a TorHS owner enable Tor2web Mode, then it's assumed that he don't want "location anonymity" while preserving all other properties of TorHS (link-level encryption, self-authenticating URI, etc).
With latest improvements of #12844 the performance of Tor2web Mode will be even better.
For TorHS like Facebook or other resources that *does not need* location anonymity, having shorter circuit is a great performance improvement either in latency either in bandwidth.
I would suggest/consider to introduce Tor2web mode (or something called differently) to be usable on stock Tor software, to enable quick optimization of TorHS owner that need performance by scarifying location anonymity .
I fully agree that a server-side equivalent of Tor2web mode should be made. The closest design we have so far is Roger's "encrypted services" proposal: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-enc...
Before implementation, the proposal needs some polishing and we should think of any further optimizations that can be done (e.g. the IP-equivalent of #12844 or something?). Implementation is not super hard, but not trivial either. It will be great if we could do this as part of SponsorR.
Then, HSes with the Facebook threat model (who don't care about location privacy) would be able to use this mode so that they are faster and also cause less traffic to the network.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev