Hi David, Roger,
I think it would be great if we could show off some HS performance improvements at the next Sponsor R PI meeting in January, both as a way to show that we are making progress on the performance front and also to help demonstrate our agility and how quickly we can get results on new designs. I’m here to advocate the use of Shadow to help in this regard.
So far, the list of deliverable that we could possibly show improvements is quite small: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR#Tickets
From the list of completed tickets, the only one that stands out to me as providing a performance boost is #13211 (Allow optimistic data on connections to hidden services): https://trac.torproject.org/projects/tor/ticket/13211
This seems like a somewhat small change; despite that, do we think this may be something worth simulating to verify it works as expected and understand the extent to which it reduces time to first byte for HS downloads?
Are there other HS performance improvements that we think may be ready by January?
Best, Rob
On 20 Nov (14:45:12), Rob Jansen wrote:
Hi David, Roger,
Hello!
I think it would be great if we could show off some HS performance improvements at the next Sponsor R PI meeting in January, both as a way to show that we are making progress on the performance front and also to help demonstrate our agility and how quickly we can get results on new designs. I’m here to advocate the use of Shadow to help in this regard.
So far, the list of deliverable that we could possibly show improvements is quite small: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR#Tickets
From the list of completed tickets, the only one that stands out to me as providing a performance boost is #13211 (Allow optimistic data on connections to hidden services): https://trac.torproject.org/projects/tor/ticket/13211
This seems like a somewhat small change; despite that, do we think this may be something worth simulating to verify it works as expected and understand the extent to which it reduces time to first byte for HS downloads?
Are there other HS performance improvements that we think may be ready by January?
Here is what we (Karsten, George and me) are up to on that front for January. (Please guys, feel free to fill the blank if I am missing anything).
We are aware of the january deadline so we've split up the work in two parts. George/Karsten are working on a proposal to gather HS statistics on the real Tor network to answer some questions that are presented here.
https://people.torproject.org/~karsten/volatile/238-hs-relay-stats-2014-11-2...
On my part, I have a chutney network with an HS and clients that fetch data on it. I'm currently working on instrumenting the HS subsystem so we can gather performance data and analyze it for meaningful pointers on where are the contention points, confirm expected behaviors, etc... I'll begin soon updating the following ticket with more information on the work I'm doing. (I'm in Boston right now collaborating with Nick for the week so things are a bit more slow on this front until monday).
https://trac.torproject.org/projects/tor/ticket/13792
This could be used also with shadow I presume. Since the deadline is near us, I choose chutney for simplicity reasons here. I'll have a talk with Nick tomorrow on how we can possibly have this instrumentation upstream (either logs, controller event or/and tracing).
Please note that Karsten is helping everyone here for both parts! :).
On the host side of HS (meaning the HS relay itself), we have profiled an HS service that is being hammered with hundreds of connections. The client has been fixed to handle that also btw (#13698). I've talked this one out with Nick also and we have an idea for a solution that is in the mention in the ticket. Fixing that would in theory improve quite a bit the host side performance of HS. You can find the info here.
https://trac.torproject.org/projects/tor/ticket/13739
Things are going forward, we still have some work ahead to gather the HS performance baseline and start trying to improve it. I'm fairly confident that the performance statistics in a private network will give us a good insight on the current situation.
Feel free to propose anything that could be useful to make this thing more efficient/faster/useful :).
Cheers! David
Best, Rob
On Nov 20, 2014, at 4:59 PM, David Goulet dgoulet@ev0ke.net wrote:
On 20 Nov (14:45:12), Rob Jansen wrote:
Are there other HS performance improvements that we think may be ready by January?
On my part, I have a chutney network with an HS and clients that fetch data on it. I'm currently working on instrumenting the HS subsystem so we can gather performance data and analyze it for meaningful pointers on where are the contention points, confirm expected behaviors, etc... I'll begin soon updating the following ticket with more information on the work I'm doing. (I'm in Boston right now collaborating with Nick for the week so things are a bit more slow on this front until monday).
https://trac.torproject.org/projects/tor/ticket/13792
This could be used also with shadow I presume. Since the deadline is near us, I choose chutney for simplicity reasons here.
Chutney is the right tool for tracing CPU resource problems. Shadow is the right tool when trying to gather realistic network level performance statistics, and testing code at scale. Also, Shadow potentially runs faster than real time if you are only using a handful of nodes. If you are not using Shadow because it is too complex, then please, please let me help with that.
I'll have a talk with Nick tomorrow on how we can possibly have this instrumentation upstream (either logs, controller event or/and tracing).
That would be great! Making it easy to gather data, even if only in TestingTorNetwork mode, will pay dividends.
Things are going forward, we still have some work ahead to gather the HS performance baseline and start trying to improve it. I'm fairly confident that the performance statistics in a private network will give us a good insight on the current situation.
Feel free to propose anything that could be useful to make this thing more efficient/faster/useful :).
I totally agree that a private network is the right approach. A small network will be useful to isolate some performance issues, but I think we also need to make sure we test at a larger scale with the addition of realistic background traffic, etc, so that we understand the performance benefits in a more realistic environment. Shadow allows us to do this and have stats across the entire network on the order of hours. I have the resources to run at least 6000 relays and 30000 clients in a private ShadowTor deployment, and I hope that having results on this scale will impress our funder in January.
Perhaps after you finish your traces in chutney and work out some of the code bottlenecks, I can run some more realistic network experiments in Shadow. (Separate branches for each improvement would help here.) Would this actually be helpful? Or do we think that by the time we get to the Shadow step we would have already learned everything we need to know?
-Rob
On 21 Nov (10:14:23), Rob Jansen wrote:
On Nov 20, 2014, at 4:59 PM, David Goulet dgoulet@ev0ke.net wrote:
On 20 Nov (14:45:12), Rob Jansen wrote:
Are there other HS performance improvements that we think may be ready by January?
On my part, I have a chutney network with an HS and clients that fetch data on it. I'm currently working on instrumenting the HS subsystem so we can gather performance data and analyze it for meaningful pointers on where are the contention points, confirm expected behaviors, etc... I'll begin soon updating the following ticket with more information on the work I'm doing. (I'm in Boston right now collaborating with Nick for the week so things are a bit more slow on this front until monday).
https://trac.torproject.org/projects/tor/ticket/13792
This could be used also with shadow I presume. Since the deadline is near us, I choose chutney for simplicity reasons here.
Chutney is the right tool for tracing CPU resource problems. Shadow is the right tool when trying to gather realistic network level performance statistics, and testing code at scale. Also, Shadow potentially runs faster than real time if you are only using a handful of nodes. If you are not using Shadow because it is too complex, then please, please let me help with that.
Yes, considering the amount of resources you have for a big Tor private network, shadow is definitely a good idea!
The plan right now is for me to start instrumenting the Tor code base (for now it will only be the HS subsystem) and start collecting under chutney to make it work and useful.
Once this is done, we should definitely move that experiment to shadow and make it run on a huge network with multiple events on it (clients, non tor traffic, etc...).
Please see https://trac.torproject.org/projects/tor/ticket/13802 about the instrumentation part. We'll definitely have to talk more on the integration of Shadow and a userspace tracer but of what I got from Nick, it sounds totally doable without too much trouble.
I'll have a talk with Nick tomorrow on how we can possibly have this instrumentation upstream (either logs, controller event or/and tracing).
That would be great! Making it easy to gather data, even if only in TestingTorNetwork mode, will pay dividends.
Yes and having it upstream will make things easier to scale with performance analysis in the future. :)
Things are going forward, we still have some work ahead to gather the HS performance baseline and start trying to improve it. I'm fairly confident that the performance statistics in a private network will give us a good insight on the current situation.
Feel free to propose anything that could be useful to make this thing more efficient/faster/useful :).
I totally agree that a private network is the right approach. A small network will be useful to isolate some performance issues, but I think we also need to make sure we test at a larger scale with the addition of realistic background traffic, etc, so that we understand the performance benefits in a more realistic environment. Shadow allows us to do this and have stats across the entire network on the order of hours. I have the resources to run at least 6000 relays and 30000 clients in a private ShadowTor deployment, and I hope that having results on this scale will impress our funder in January.
That is a HUGE network, love it. For sure, we should definitely run the hs stats in your setup ;).
Perhaps after you finish your traces in chutney and work out some of the code bottlenecks, I can run some more realistic network experiments in Shadow. (Separate branches for each improvement would help here.) Would this actually be helpful? Or do we think that by the time we get to the Shadow step we would have already learned everything we need to know?
I think I mostly answered these above but to answer the last question here, I think having this large of a network will most probably show us things we can NOT observerd in a small chutney network.
Cheers! David
-Rob
On Nov 21, 2014, at 10:40 AM, David Goulet dgoulet@ev0ke.net wrote:
Please see https://trac.torproject.org/projects/tor/ticket/13802 about the instrumentation part. We'll definitely have to talk more on the integration of Shadow and a userspace tracer but of what I got from Nick, it sounds totally doable without too much trouble.
If we want the tracer to also work inside of Shadow, then the biggest potential problem I can think of right now is thread safety. Shadow uses several worker threads, each of which are assigned to run hundreds to thousands of Tor nodes. If Tor is using lttng as a dynamic library and it is not thread-safe, we will run into issues.
One way to avoid those issues could be to statically link lttng to Tor. However, even this could go bad if lttng uses global state, because that would mean that those hundreds of Tor nodes assigned to a Shadow worker thread would be sharing that state. Probably not what we want. To get around the global state issue, Shadow would have to compile lttng specially, using the same LLVM pass to hoist out the global variables as we use for Tor. That may get messy.
So it really depends on how robust lttng is, and as I have no experience with it, I can only speculate. But if you let me know when you have some minimal instrumentation ready, I can test in Shadow early enough that we could adjust if needed.
-Rob
On 21 Nov (12:59:43), Rob Jansen wrote:
On Nov 21, 2014, at 10:40 AM, David Goulet dgoulet@ev0ke.net wrote:
Please see https://trac.torproject.org/projects/tor/ticket/13802 about the instrumentation part. We'll definitely have to talk more on the integration of Shadow and a userspace tracer but of what I got from Nick, it sounds totally doable without too much trouble.
If we want the tracer to also work inside of Shadow, then the biggest potential problem I can think of right now is thread safety. Shadow uses several worker threads, each of which are assigned to run hundreds to thousands of Tor nodes. If Tor is using lttng as a dynamic library and it is not thread-safe, we will run into issues.
One way to avoid those issues could be to statically link lttng to Tor. However, even this could go bad if lttng uses global state, because that would mean that those hundreds of Tor nodes assigned to a Shadow worker thread would be sharing that state. Probably not what we want. To get around the global state issue, Shadow would have to compile lttng specially, using the same LLVM pass to hoist out the global variables as we use for Tor. That may get messy.
LTTng is an inprocess library and spawns a thread to handle all the tracing and interaction with the main tracing registry of lttng (that manages the buffers, clients, consumers, streaming, etc...).
Nick told me that Shadow moves forward the clock so as long as you highjack clock_gettime for monotonic time, we'll be fine :).
So it really depends on how robust lttng is, and as I have no experience with it, I can only speculate. But if you let me know when you have some minimal instrumentation ready, I can test in Shadow early enough that we could adjust if needed.
The LTTng userspace tracer is thread safe, no issue with that :).
I already have a couple of tracepoints in the HS client subsystem as we speak, I'm currently adding more to do some very basic measurements on the timings of each client HS cell (in rend_process_relay_cell()).
Once I have something that you can try, I'll send you a link to the branch with the instrumentation and you can see if you can make it happen with shadow :).
Cheers! David
-Rob
On Nov 21, 2014, at 1:06 PM, David Goulet dgoulet@ev0ke.net wrote:
On 21 Nov (12:59:43), Rob Jansen wrote:
On Nov 21, 2014, at 10:40 AM, David Goulet dgoulet@ev0ke.net wrote:
Please see https://trac.torproject.org/projects/tor/ticket/13802 about the instrumentation part. We'll definitely have to talk more on the integration of Shadow and a userspace tracer but of what I got from Nick, it sounds totally doable without too much trouble.
If we want the tracer to also work inside of Shadow, then the biggest potential problem I can think of right now is thread safety. Shadow uses several worker threads, each of which are assigned to run hundreds to thousands of Tor nodes. If Tor is using lttng as a dynamic library and it is not thread-safe, we will run into issues.
One way to avoid those issues could be to statically link lttng to Tor. However, even this could go bad if lttng uses global state, because that would mean that those hundreds of Tor nodes assigned to a Shadow worker thread would be sharing that state. Probably not what we want. To get around the global state issue, Shadow would have to compile lttng specially, using the same LLVM pass to hoist out the global variables as we use for Tor. That may get messy.
LTTng is an inprocess library and spawns a thread to handle all the tracing and interaction with the main tracing registry of lttng (that manages the buffers, clients, consumers, streaming, etc...).
Nick told me that Shadow moves forward the clock so as long as you highjack clock_gettime for monotonic time, we'll be fine :).
Great! Shadow does interpose clock_gettime (among other time functions).
So it really depends on how robust lttng is, and as I have no experience with it, I can only speculate. But if you let me know when you have some minimal instrumentation ready, I can test in Shadow early enough that we could adjust if needed.
The LTTng userspace tracer is thread safe, no issue with that :).
That’s a relief!
I already have a couple of tracepoints in the HS client subsystem as we speak, I'm currently adding more to do some very basic measurements on the timings of each client HS cell (in rend_process_relay_cell()).
Once I have something that you can try, I'll send you a link to the branch with the instrumentation and you can see if you can make it happen with shadow :).
OK, great!
-Rob
Cheers! David
-Rob