Re: [tor-dev] Proposal: Optimistic Data for Tor: Client Side

5 Jun 2011

      Ian Goldberg iang@cs.uwaterloo.ca wrote:
...
On Sat, Jun 04, 2011 at 08:42:33PM +0200, Fabian Keil wrote:
...
Ian Goldberg iang@cs.uwaterloo.ca wrote:
...
...
...
Overview:
This proposal (as well as its already-implemented sibling concerning the
server side) aims to reduce the latency of HTTP requests in particular
by allowing:

SOCKS clients to optimistically send data before they are notified
 that the SOCKS connection has completed successfully

So it should mainly reduce the latence of HTTP requests
that need a completely new circuit, right?
No, just ones that need a new TCP stream.  The optimistic data stuff is
about quickly sending data in just-being-constructed streams within an
already-constructed circuit.
I see.
...
...
Do you have a rough estimate of what percentage of requests would
actually be affected? I mean, how may HTTP requests that need a new
circuit are there usually compared to requests that can reuse an
already existing one (or even reuse the whole connection)?
Assuming you mean "stream" instead of "circuit" here, then, as above, I
think most HTTP connections would be in this category.  It might be
interesting to examine some HTTP traces to see, though.  <shoutout
target="Kevin">Kevin, you were looking at some HTTP traces for other
reasons, right?  Anything in there that may help answer this
question?</shoutout>
I actually meant "how many HTTP requests that need a new circuit
are there usually compared to requests that only need an new stream
(or reuse the whole connection)?"
You've already written above that HTTP request that need a completely
new circuit aren't affected anyway, so that leaves the requests that
need a new stream and those that don't and I can get a rough idea
about those myself by looking at my logs (your mileage is likely to
vary of course):
fk@r500 ~ $privoxy-log-parser.pl --statistics /usr/jails/privoxy-jail/var/log/privoxy/privoxy.log.*
Client requests total: 430598
[...]
Outgoing requests: 300971 (69.90%)
Server keep-alive offers: 156193 (36.27%)
New outgoing connections: 237488 (55.15%)
Reused connections: 63483 (14.74%; server offers accepted: 40.64%)
Empty responses: 5244 (1.22%)
Empty responses on new connections: 430 (0.10%)
Empty responses on reused connections: 4814 (1.12%)
[...]
...
...
I'm aware that this depends on various factors, but I think even
having an estimate that is only valid for a certain SOCKS client
visiting a certain site would be useful.
I think overall across sites would be a better number, no?
Sure.
...
...
How much data is the SOCKS client allowed to send optimistically?
I'm assuming there is a limit of how much data Tor will accept?
One stream window.
...
And if there is a limit, it would be useful to know if optimistically
sending data is really worth it in situations where the HTTP request
can't be optimistically sent as a whole.
I suspect it's rare that an HTTP request doesn't fit in one stream
window (~250 KB).
I agree, I expected the stream window to be a lot smaller.
...
...
While cutting down the time-to-first-byte for the HTTP request is always
nice, in most situations the time-to-last-byte is more important as the
HTTP server is unlikely to respond until the whole HTTP request has been
received.
What?  No, I think you misunderstand.  The time-to-first-byte is the
time until the first byte of the *response* is received back at the
client.
Makes sense. Thanks for the clarification.
...
...
...
SOCKS clients (e.g. polipo) will also need to be patched to take
advantage of optimistic data.  The simplest solution would seem to be to
just start sending data immediately after sending the SOCKS CONNECT
command, without waiting for the SOCKS server reply.  When the SOCKS
client starts reading data back from the SOCKS server, it will first
receive the SOCKS server reply, which may indicate success or failure.
If success, it just continues reading the stream as normal.  If failure,
it does whatever it used to do when a SOCKS connection failed.
For a SOCKS client that happens to be a HTTP proxy, it can be easier
to limit the support for "SOCKS with optimistic data" to "small"
requests instead to support it for all. (At least it would be for
Privoxy.)
For small requests it's (simplified):

Read the whole request from the client
Connect to SOCKS server/Deal with the response
Send the whole request
Read the response

As opposed to:

Read as much of the response as necessary to decide
how to handle it (which usually translates to reading
at least all the headers)
Connect to SOCKS server/Deal with the response
Send as much of the request as already known
Read some more of the client request
Send some more of the request to the server
Repeat steps 4 and 5 until the whole request has been
sent or one of the connections is prematurely disconnected
Read the response

Implementing it for the latter case as well would be more work
and given that most requests are small enough to be read completely
before opening the SOCKS connections, the benefits may not be big
enough to justify it.
A reasonable proxy server (e.g. polipo, I'm pretty sure) streams data
wherever possible.
Sure, but even polipo can't stream data the client didn't send yet and
in case of requests larger than a few MTU sizes, for example file uploads,
the SOCKS connection is probably established before the whole client
request has been received.
...
               Certainly for responses: I seem to remember that

privoxy indeed reads the whole response from the HTTP server before
starting to send it to the web client, which adds a ton of extra delay
in TTFB.
Privoxy buffers the whole response if it's configured to filter it
(as the decision to modify the first response byte could depend on
the last byte).
If no filters are enabled (this seems to be the case for the
configuration Orbot uses), or no filters apply, the response
data is forwarded to the client as it arrives.
...
...
I wouldn't be surprised if there's a difference for some browsers, too.
How so?  The browser sends the HTTP request to the proxy, and reads the
response.  What different behaviour might it have?  The only one I can
think of is "pipelining" requests, which some browsers/proxies/servers
support and others don't.  That is, if you've got 4 files to download
from the same server, send the 4 HTTP requests on the same TCP stream
before getting responses from any of them.  In that case, you'll see the
benefit for the first request in the stream, but not the others, since
the stream will already be open.
I was thinking about file uploads. Currently it's not necessary for
the client to read the whole file before the SOCKS connection is even
established, but it would be, to optimistically send it (or at least
the first part of it).
Even old browsers support file uploads up to ~2GB, so this would
also be a case where the request might be too large to fit in the
stream window.
While file uploads are certainly rare (and successfully pushing
2GB through Tor might be a challenge), it might be worth thinking
about how to handle them anyway. Letting the Tor client cache the
whole file is probably not the best solution above a certain file
size.
I also thought about another case where it's not obvious to me
what to do: a HTTPS connection made by a browser through a HTTP
proxy.
Currently the browser will not start sending data for the server
until the proxy has signaled that the connection has been established,
which the proxy doesn't know until told so by the SOCKS server.
If the HTTP proxy "optimistically lies" to the client it will
not be able to send a proper error message if the SOCKS connection
actually can't be established. Of course this only matters if
the client does something useful with the error message, and at
least Firefox stopped doing that a while ago.
The impact on SSL connections is probably less significant anyway,
though.
Thanks a lot for the detailed response.
Fabian

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] Proposal: Optimistic Data for Tor: Client Side