Re: [tor-dev] Torperf implementation considerations

23 Oct 2013

      On 10/1/13 3:03 AM, Kevin Butler wrote:
...
Hi Karsten, Sathya,
Hope you've both had great weekends, please see inline!
Hi Kevin,
apologies for not replying earlier!  Finally, replying now.
...
...
Want to help define the remaining data formats?  I think we need these
formats:

file_upload would be quite similar to file_download, but for the GET

POST performance experiment.  Or maybe we can generalize file_download
to cover either GET or POST requests and the respective timings.

We'll need a document for hidden_service_request that does not only

contain timings, but also references the client-side circuit used to
fetch the hidden service descriptor, rendezvous circuit, and
introduction circuit, and server-side introduction and rendezvous circuits.

These data formats are all for fetching/posting static files.  We

should decide on a data format for actual website fetches.  Rob van der
Hoeven suggested HAR, which I included in a footnote.  So, maybe we
should extend HAR to store the tor-specific stuff, or we should come up
with something else.

Are there any other data formats missing?

I think extending the HAR format (with minimal changes really, it's
already reasonably generic) would be a good fit for the real fetches
indeed. Do you feel HAR is overkill for the others?
I'm not sure, because I don't know the HAR format (this was a suggestion
that didn't look crazy to me, so I added it to the PDF).  But I think we
could use HAR for all kinds of requests.  We'll probably need to use
something else for stream and circuit information, because they can be
unrelated to specific requests.
...
I think it wouldn't be such a bad idea to use it for all, perhaps this
could be a future requirement if not an initial one. (e.g. the
static_file_downloads would have that in 'creator', but it would have
multiple 'entries' each representing a static file (with our own fields
added, filesize, tor_version, etc...)
Plausible, though I can't really comment on this with my limited
knowledge of the HAR format.
...
There are a number of perks with using HAR:

TBB probably already knows how to record .HAR files so the selenium

work would basically just be to open a browser and record a few
navigations to .HAR (I know Chrome can do this easily, so I'm assuming
our TBB version of Firefox is also capable)
Probably.  Though we'll need to add stream/circuit references.  Can we
do that?
...

We can benefit from any tooling build around HAR, either to

statistically analyse, or to provide visualisation.

There is a a decent amount of research around HAR compression

(although it basically seems to just be gzipping) but if we can support
compressed HAR then we can allow servers to store a lot more history.
There is also the negative that the HAR files will probably provide *too
much* data, but we could probably prune the files before archiving them
or as a stage before total deletion.
While spending some time implementing these things, I have been playing
with a faked alexa experiment, and I think the HAR format, or atleast
something that allows for multiple entries per experiment results set is
necessary for all our experiments (even static file).
I'll get back to you regarding these data formats in future when I have
time to actually look at what the other experiments need (I've mainly
focused on alexa & static downloading this far)
Makes sense.
...
...
...
I think the magically detect and run part can definitely be left for
future, but installation should still be this easy.
Surely it's just as easy to implement detecting new experiments on
service startup as to implement not doing that. (while still somehow
allowing experiments to be added... is this implying hardcoded experiments?)
I guess experiment types will be hard-coded, but experiment instances
will be configurable.
...
Also, perhaps you don't want to support this, but how does the patch and
merge system work for quick deployments of short lived experiments? (Is
there ever such a thing? Karsten?)
Yes, there might be such a thing as short-lived experiments.  We'd
probably commit such a patch to a separate branch and decide after the
experiment if it's worth adding the experiment to the master branch.
...
Or what if someone does develop a neat set of experiments for their own
personal use that doesn't really apply to the project as a whole, are we
expected to merge them upstream? What if they don't want to share?
I think we should only merge experiments that are general enough for
others to run.
This doesn't entirely address my concern. If upstream branches are made
for short lived experiments (rather than just sharing a folder between
people), how do the users install that? (since they would have initially
apt-get installed? not git/svn?)
And my concern around non-general or non-shared experiments skips the
issue, how will they distribute them to whoever needs to run it? Their
own git repo infrastructure? (But of course I agree we should only
upstream general things)
I'm mostly thinking of developers who would run custom branches.  And if
somebody cannot handle Git, we can given them a tarball.  But really,
custom experiments should be the exception, not the rule.
...
...
...
...
Torperf should help with bootstrapping and shutting down tor, because
that's something that all experiments need.  Locating tor could just be
a question of passing the path to a tor binary to Torperf.  See above
for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'.
{ ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
Is there requirement to run the *same* experiment across different Tor
versions at the same time (literally parallel) or just to have "I as a
user set this up to run for X,Y,Z versions and ran it one time and got
all my results."?
You would typically not run an experiment a single time, but set it up
to run for a few days.  And you'd probably set up parallel experiments
to start with a short time offset.  (Not sure if this answers your
question.)
I messed up the question with that single word, I'll address this below.
I meant 'ran torperf once' and 'got my results periodically as the
schedule defines'
...
...
I think this is what Sathya is saying with:
...
We could just run a torperf test against a particular tor version, 
once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for
it at any time, and it does it's versioned runs sequentially.
For each experiment there's one tor instance running a given version.
You wouldn't stop, downgrade/upgrade, and restart the tor instance while
the experiment is running.  If you want to run an experiment on
different tor versions, you'd start multiple experiments.  For example:

download 50KiB static file, use tor 0.2.3.x on socks port 9001, start

every five minutes starting at :01 of the hour.

download 50KiB static file, use tor 0.2.4.x on socks port 9002, start

every five minutes starting at :02 of the hour.

download 50KiB static file, use tor 0.2.5.x on socks port 9003, start

every five minutes starting at :03 of the hour.
I don't think it's a good idea to have to define such specificity for
humans. They should be able to just define the 50kb (and probably more
file sizes) with an five minute interval for versions x, y, z. (I also
don't think the user should define the socks port to use, but that's a
minor detail.)
I think you've answered my question here though. I'll summarise below!
...
...
I think the discussion above is talking about two different things, I
think it would be beneficial to decide what needs to be actually
parallel and what just needs to be one-time setup for a user.
Are there any concerns around parallel requests causing noise in the
timing information? Or are we happy to live with a small 1-2(?)ms noise
level per experiment in order to benefit from faster experiment runtimes
in aggregate?
Not sure which of these questions are still open.  We should definitely
get this clear in the design document.  What would we write, and where
in the document would we put it?
Perhaps we can cover this near 2.1.2 "If the service operator wants to
run multiple experiments...."
I think you've defined above that no experiment should be run at the
same time as another. I.e. the main service should not be executing code
for different experiments at the same time. (Above you've humanly
inputted to start each a minute after, which assumes each experiment
takes under a minute to execute --- what happens if there are timeouts
or slow networks?)
Ah, I didn't mean that experiments should finish under 1 minute.  They
can run up to five minutes.  Starting them at :01, :02, and :03 was just
a naive way of avoiding bottlenecks during connection establishment.
...
I would agree with this as it will help to keep experiments more
accurate (e.g. static file download for a 50MB file didn't adversely
affect the performance for a hidden service test that started at the
same time)
(If somebody configures their Torperf to download a 50MB file, they
deserve that their tests break horribly.  Let's leave some bandwidth for
actual users. ;))
...
I think the service itself should handle scheduling things on a periodic
basis and should make it clear how late the service happened compared to
it's desired schedule (e.g. one experiment took longer than a minute, so
the other started 5 seconds late, so it should start in (interval - 5
seconds) time.
How this would happen in practice would be, the service starts up,
checks results files for last runtimes for each experiment, then runs
any that haven't run in their last INTERVAL seconds. Assuming
experiments execute in time lower than
SHORTEST_ALLOWED_INTERVAL/NUMBER_OF_EXPERIMENTS then on average the
system will perfectly schedule by default.
There could be problems with this approach though, for example, what
happens if there is a large experiment, e.g. Alexa when the network is
slow, which takes say, 5 minutes. If this is scheduled to run every 10
minutes and there are some other experiments that are scheduled to run
every 2 minutes, then we have a problem. Either the 2 minute experiments
run on their intended schedule with potential inaccuracies caused, or
the 2 minute interval is not a 2minute interval. I think we should aim
for the latter and warn the user when they have made schedules like the
above.

Another option is to break up experiments into chunks, where overall

only one request is going at a time, so a request is our atomic
scheduling option, but that becomes harder to coordinate and is highly
inefficient in terms of network throughput.
We could monitor experiments average runtime and determine 'optimal'
scheduling based on that, but I think the best thing in the short term
is just to say 'don't schedule long experiments to be run frequently if
you plan to run lots of other small experiments'
We could make the service smart enough not to start all requests at the
same time, e.g., by adding random delays.  And we could allow users to
override this by defining a manual offset for each experiment.  That
way, new users don't have to care, and expert users can fine-tune things.
...
...
...
On that, can we be clear with our vocabulary, "Torperf tests" means
"Torperf experiments", right?
Yes.  Hope my "experiment type" vs. "experiment instance" was not too
confusing. ;)
That's perfectly clear to me :)
How do we proceed?  Would you mind sending me a diff of the changes to
the design document that make these things clearer to you?
Also, I'm thinking about publishing the tech report, even though there's
no running code yet (AFAIK).  The reason is that I'd like to call this
report the output of sponsor F deliverable 8.  Originally, we promised
code, but a design document is better than nothing.
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year3
I can include changes until, say, Monday, October 28.
Thanks in advance!  And sorry again for the long delay!
All the best,
Karsten

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Torperf implementation considerations