On Sat, Jun 04, 2011 at 08:42:33PM +0200, Fabian Keil wrote:
Ian Goldberg iang@cs.uwaterloo.ca wrote:
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Me too, although 25 to 50 percent seem to be more of best case scenario and for some requests it's unlikely to make a difference.
The only requests for which it wouldn't make a difference are I think ones that can reuse an existing stream (that is, you've visited that same website recently enough that your browser is reusing an open TCP connection to the HTTP server, but not so recently (e.g. at the same time) that it's opening parallel connections). So I think most of the time, you'll see the benefit.
Filename: xxx-optimistic-data-client.txt Title: Optimistic Data for Tor: Client Side Author: Ian Goldberg Created: 2-Jun-2011 Status: Open
Overview:
This proposal (as well as its already-implemented sibling concerning the server side) aims to reduce the latency of HTTP requests in particular by allowing:
- SOCKS clients to optimistically send data before they are notified that the SOCKS connection has completed successfully
So it should mainly reduce the latence of HTTP requests that need a completely new circuit, right?
No, just ones that need a new TCP stream. The optimistic data stuff is about quickly sending data in just-being-constructed streams within an already-constructed circuit.
Do you have a rough estimate of what percentage of requests would actually be affected? I mean, how may HTTP requests that need a new circuit are there usually compared to requests that can reuse an already existing one (or even reuse the whole connection)?
Assuming you mean "stream" instead of "circuit" here, then, as above, I think most HTTP connections would be in this category. It might be interesting to examine some HTTP traces to see, though. <shoutout target="Kevin">Kevin, you were looking at some HTTP traces for other reasons, right? Anything in there that may help answer this question?</shoutout>
I'm aware that this depends on various factors, but I think even having an estimate that is only valid for a certain SOCKS client visiting a certain site would be useful.
I think overall across sites would be a better number, no?
Did you also measure the differences between requests that need a new circuit and requests that only need a new connection from the exit node to the destination server?
If there's a new connection from the exit node to the destination server, then there's a new stream, and you would see the full benefit of this proposal.
This change will save one OP<->Exit round trip (down to one from two). There are still two SOCKS Client<->OP round trips (negligible time) and two Exit<->Server round trips. Depending on the ratio of the Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will decrease the latency by 25 to 50 percent. Experiments validate these predictions. [Goldberg, PETS 2010 rump session; see https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]
Can you describe the experiment some more?
A webfetch client, using a single circuit, downloaded a web page from a fixed server (to eliminate variance due to different server RTTs and performance) 950 times. Each time, it randomly decided whether to use optimistic data (the "SOCKS 4b" line) or not (the "SOCKS 4a" line). The time from the start of the request (webfetch making the SOCKS connection) to the time the first data byte of the HTTP response ("HTTP/1.1 200 OK") arrived at webfetch was recorded, and the two CDFs of those values were plotted.
I'm a bit puzzled by your "Results" graph. How many requests does it actually represent and what kind of request were used?
As above, approximately 475 HTTP GET requests of each type. Note that the size of the fetched page is irrelevant to this measurement.
How much data is the SOCKS client allowed to send optimistically? I'm assuming there is a limit of how much data Tor will accept?
One stream window.
And if there is a limit, it would be useful to know if optimistically sending data is really worth it in situations where the HTTP request can't be optimistically sent as a whole.
I suspect it's rare that an HTTP request doesn't fit in one stream window (~250 KB).
While cutting down the time-to-first-byte for the HTTP request is always nice, in most situations the time-to-last-byte is more important as the HTTP server is unlikely to respond until the whole HTTP request has been received.
What? No, I think you misunderstand. The time-to-first-byte is the time until the first byte of the *response* is received back at the client. That's when the user's screen will start changing; previous work (does someone have a cite handy?) has indicated that if a page takes too long to start to change, users get frustrated with the slowness.
SOCKS clients (e.g. polipo) will also need to be patched to take advantage of optimistic data. The simplest solution would seem to be to just start sending data immediately after sending the SOCKS CONNECT command, without waiting for the SOCKS server reply. When the SOCKS client starts reading data back from the SOCKS server, it will first receive the SOCKS server reply, which may indicate success or failure. If success, it just continues reading the stream as normal. If failure, it does whatever it used to do when a SOCKS connection failed.
For a SOCKS client that happens to be a HTTP proxy, it can be easier to limit the support for "SOCKS with optimistic data" to "small" requests instead to support it for all. (At least it would be for Privoxy.)
For small requests it's (simplified):
- Read the whole request from the client
- Connect to SOCKS server/Deal with the response
- Send the whole request
- Read the response
As opposed to:
- Read as much of the response as necessary to decide how to handle it (which usually translates to reading at least all the headers)
- Connect to SOCKS server/Deal with the response
- Send as much of the request as already known
- Read some more of the client request
- Send some more of the request to the server
- Repeat steps 4 and 5 until the whole request has been sent or one of the connections is prematurely disconnected
- Read the response
Implementing it for the latter case as well would be more work and given that most requests are small enough to be read completely before opening the SOCKS connections, the benefits may not be big enough to justify it.
A reasonable proxy server (e.g. polipo, I'm pretty sure) streams data wherever possible. Certainly for responses: I seem to remember that privoxy indeed reads the whole response from the HTTP server before starting to send it to the web client, which adds a ton of extra delay in TTFB. I'm pretty sure polipo doesn't do that, but just streams the data as it arrives. Each program is likely to do the corresponding thing with the requests.
I wouldn't be surprised if there's a difference for some browsers, too.
How so? The browser sends the HTTP request to the proxy, and reads the response. What different behaviour might it have? The only one I can think of is "pipelining" requests, which some browsers/proxies/servers support and others don't. That is, if you've got 4 files to download from the same server, send the 4 HTTP requests on the same TCP stream before getting responses from any of them. In that case, you'll see the benefit for the first request in the stream, but not the others, since the stream will already be open.
And even if there isn't, it may still be useful to only implement it for some requests to reduce the memory footprint of the local Tor process.
Roger/Nick: is there indeed a limit to how much data from the SOCKS client the OP will queue up today before the stream is open?
Security implications:
ORs (for sure the Exit, and possibly others, by watching the pattern of packets), as well as possibly end servers, will be able to tell that a particular client is using optimistic data. This of course has the potential to fingerprint clients, dividing the anonymity set.
If some clients only use optimistic data for certain requests it would divide the anonymity set some more, so maybe the proposal should make a suggestion and maybe Tor should even enforce a limit on the client side.
I think the worse case is when most people are using optimistic data, but some older clients don't. They'll stand out more readily.
Performance and scalability notes:
OPs may queue a little more data, if the SOCKS client pushes it faster than the OP can write it out. But that's also true today after the SOCKS CONNECT returns success, right?
It's my impression that there's currently a limit of how much data Tor will read and buffer from the SOCKS client. Otherwise Tor could end up buffering the whole request, which could be rather large.
It would be. As above, is this actually true, though?
- Ian