On 10/1/13 3:03 AM, Kevin Butler wrote:
Hi Karsten, Sathya,
Hope you've both had great weekends, please see inline!
Hi Kevin,
apologies for not replying earlier! Finally, replying now.
Want to help define the remaining data formats? I think we need these formats:
- file_upload would be quite similar to file_download, but for the GET
POST performance experiment. Or maybe we can generalize file_download to cover either GET or POST requests and the respective timings.
- We'll need a document for hidden_service_request that does not only
contain timings, but also references the client-side circuit used to fetch the hidden service descriptor, rendezvous circuit, and introduction circuit, and server-side introduction and rendezvous circuits.
- These data formats are all for fetching/posting static files. We
should decide on a data format for actual website fetches. Rob van der Hoeven suggested HAR, which I included in a footnote. So, maybe we should extend HAR to store the tor-specific stuff, or we should come up with something else.
- Are there any other data formats missing?
I think extending the HAR format (with minimal changes really, it's already reasonably generic) would be a good fit for the real fetches indeed. Do you feel HAR is overkill for the others?
I'm not sure, because I don't know the HAR format (this was a suggestion that didn't look crazy to me, so I added it to the PDF). But I think we could use HAR for all kinds of requests. We'll probably need to use something else for stream and circuit information, because they can be unrelated to specific requests.
I think it wouldn't be such a bad idea to use it for all, perhaps this could be a future requirement if not an initial one. (e.g. the static_file_downloads would have that in 'creator', but it would have multiple 'entries' each representing a static file (with our own fields added, filesize, tor_version, etc...)
Plausible, though I can't really comment on this with my limited knowledge of the HAR format.
There are a number of perks with using HAR:
- TBB probably already knows how to record .HAR files so the selenium
work would basically just be to open a browser and record a few navigations to .HAR (I know Chrome can do this easily, so I'm assuming our TBB version of Firefox is also capable)
Probably. Though we'll need to add stream/circuit references. Can we do that?
- We can benefit from any tooling build around HAR, either to
statistically analyse, or to provide visualisation.
- There is a a decent amount of research around HAR compression
(although it basically seems to just be gzipping) but if we can support compressed HAR then we can allow servers to store a lot more history.
There is also the negative that the HAR files will probably provide *too much* data, but we could probably prune the files before archiving them or as a stage before total deletion.
While spending some time implementing these things, I have been playing with a faked alexa experiment, and I think the HAR format, or atleast something that allows for multiple entries per experiment results set is necessary for all our experiments (even static file).
I'll get back to you regarding these data formats in future when I have time to actually look at what the other experiments need (I've mainly focused on alexa & static downloading this far)
Makes sense.
I think the magically detect and run part can definitely be left for future, but installation should still be this easy.
Surely it's just as easy to implement detecting new experiments on service startup as to implement not doing that. (while still somehow allowing experiments to be added... is this implying hardcoded experiments?)
I guess experiment types will be hard-coded, but experiment instances will be configurable.
Also, perhaps you don't want to support this, but how does the patch and merge system work for quick deployments of short lived experiments? (Is there ever such a thing? Karsten?)
Yes, there might be such a thing as short-lived experiments. We'd probably commit such a patch to a separate branch and decide after the experiment if it's worth adding the experiment to the master branch.
Or what if someone does develop a neat set of experiments for their own personal use that doesn't really apply to the project as a whole, are we expected to merge them upstream? What if they don't want to share?
I think we should only merge experiments that are general enough for others to run.
This doesn't entirely address my concern. If upstream branches are made for short lived experiments (rather than just sharing a folder between people), how do the users install that? (since they would have initially apt-get installed? not git/svn?)
And my concern around non-general or non-shared experiments skips the issue, how will they distribute them to whoever needs to run it? Their own git repo infrastructure? (But of course I agree we should only upstream general things)
I'm mostly thinking of developers who would run custom branches. And if somebody cannot handle Git, we can given them a tarball. But really, custom experiments should be the exception, not the rule.
Torperf should help with bootstrapping and shutting down tor, because that's something that all experiments need. Locating tor could just be a question of passing the path to a tor binary to Torperf. See above for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'. { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
Is there requirement to run the *same* experiment across different Tor versions at the same time (literally parallel) or just to have "I as a user set this up to run for X,Y,Z versions and ran it one time and got all my results."?
You would typically not run an experiment a single time, but set it up to run for a few days. And you'd probably set up parallel experiments to start with a short time offset. (Not sure if this answers your question.)
I messed up the question with that single word, I'll address this below. I meant 'ran torperf once' and 'got my results periodically as the schedule defines'
I think this is what Sathya is saying with:
We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for it at any time, and it does it's versioned runs sequentially.
For each experiment there's one tor instance running a given version. You wouldn't stop, downgrade/upgrade, and restart the tor instance while the experiment is running. If you want to run an experiment on different tor versions, you'd start multiple experiments. For example:
- download 50KiB static file, use tor 0.2.3.x on socks port 9001, start
every five minutes starting at :01 of the hour.
- download 50KiB static file, use tor 0.2.4.x on socks port 9002, start
every five minutes starting at :02 of the hour.
- download 50KiB static file, use tor 0.2.5.x on socks port 9003, start
every five minutes starting at :03 of the hour.
I don't think it's a good idea to have to define such specificity for humans. They should be able to just define the 50kb (and probably more file sizes) with an five minute interval for versions x, y, z. (I also don't think the user should define the socks port to use, but that's a minor detail.)
I think you've answered my question here though. I'll summarise below!
I think the discussion above is talking about two different things, I think it would be beneficial to decide what needs to be actually parallel and what just needs to be one-time setup for a user.
Are there any concerns around parallel requests causing noise in the timing information? Or are we happy to live with a small 1-2(?)ms noise level per experiment in order to benefit from faster experiment runtimes in aggregate?
Not sure which of these questions are still open. We should definitely get this clear in the design document. What would we write, and where in the document would we put it?
Perhaps we can cover this near 2.1.2 "If the service operator wants to run multiple experiments...."
I think you've defined above that no experiment should be run at the same time as another. I.e. the main service should not be executing code for different experiments at the same time. (Above you've humanly inputted to start each a minute after, which assumes each experiment takes under a minute to execute --- what happens if there are timeouts or slow networks?)
Ah, I didn't mean that experiments should finish under 1 minute. They can run up to five minutes. Starting them at :01, :02, and :03 was just a naive way of avoiding bottlenecks during connection establishment.
I would agree with this as it will help to keep experiments more accurate (e.g. static file download for a 50MB file didn't adversely affect the performance for a hidden service test that started at the same time)
(If somebody configures their Torperf to download a 50MB file, they deserve that their tests break horribly. Let's leave some bandwidth for actual users. ;))
I think the service itself should handle scheduling things on a periodic basis and should make it clear how late the service happened compared to it's desired schedule (e.g. one experiment took longer than a minute, so the other started 5 seconds late, so it should start in (interval - 5 seconds) time.
How this would happen in practice would be, the service starts up, checks results files for last runtimes for each experiment, then runs any that haven't run in their last INTERVAL seconds. Assuming experiments execute in time lower than SHORTEST_ALLOWED_INTERVAL/NUMBER_OF_EXPERIMENTS then on average the system will perfectly schedule by default.
There could be problems with this approach though, for example, what happens if there is a large experiment, e.g. Alexa when the network is slow, which takes say, 5 minutes. If this is scheduled to run every 10 minutes and there are some other experiments that are scheduled to run every 2 minutes, then we have a problem. Either the 2 minute experiments run on their intended schedule with potential inaccuracies caused, or the 2 minute interval is not a 2minute interval. I think we should aim for the latter and warn the user when they have made schedules like the above.
- Another option is to break up experiments into chunks, where overall
only one request is going at a time, so a request is our atomic scheduling option, but that becomes harder to coordinate and is highly inefficient in terms of network throughput.
We could monitor experiments average runtime and determine 'optimal' scheduling based on that, but I think the best thing in the short term is just to say 'don't schedule long experiments to be run frequently if you plan to run lots of other small experiments'
We could make the service smart enough not to start all requests at the same time, e.g., by adding random delays. And we could allow users to override this by defining a manual offset for each experiment. That way, new users don't have to care, and expert users can fine-tune things.
On that, can we be clear with our vocabulary, "Torperf tests" means "Torperf experiments", right?
Yes. Hope my "experiment type" vs. "experiment instance" was not too confusing. ;)
That's perfectly clear to me :)
How do we proceed? Would you mind sending me a diff of the changes to the design document that make these things clearer to you?
Also, I'm thinking about publishing the tech report, even though there's no running code yet (AFAIK). The reason is that I'd like to call this report the output of sponsor F deliverable 8. Originally, we promised code, but a design document is better than nothing.
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year3
I can include changes until, say, Monday, October 28.
Thanks in advance! And sorry again for the long delay!
All the best, Karsten