Sorry this took so long. As usual, things got inserted ahead of it in the priority queue. :-p
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Discuss. ;-)
- Ian
Ian Goldberg iang@cs.uwaterloo.ca wrote:
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Me too, although 25 to 50 percent seem to be more of best case scenario and for some requests it's unlikely to make a difference.
Filename: xxx-optimistic-data-client.txt Title: Optimistic Data for Tor: Client Side Author: Ian Goldberg Created: 2-Jun-2011 Status: Open
Overview:
This proposal (as well as its already-implemented sibling concerning the server side) aims to reduce the latency of HTTP requests in particular by allowing:
- SOCKS clients to optimistically send data before they are notified that the SOCKS connection has completed successfully
So it should mainly reduce the latence of HTTP requests that need a completely new circuit, right?
Do you have a rough estimate of what percentage of requests would actually be affected? I mean, how may HTTP requests that need a new circuit are there usually compared to requests that can reuse an already existing one (or even reuse the whole connection)?
I'm aware that this depends on various factors, but I think even having an estimate that is only valid for a certain SOCKS client visiting a certain site would be useful.
Did you also measure the differences between requests that need a new circuit and requests that only need a new connection from the exit node to the destination server?
- OPs to optimistically send DATA cells on streams in the CONNECT_WAIT state
- Exit nodes to accept and queue DATA cells while in the EXIT_CONN_STATE_CONNECTING state
This particular proposal deals with #1 and #2.
For more details (in general and for #3), see the sibling proposal 174 (Optimistic Data for Tor: Server Side), which has been implemented in 0.2.3.1-alpha.
Motivation:
This change will save one OP<->Exit round trip (down to one from two). There are still two SOCKS Client<->OP round trips (negligible time) and two Exit<->Server round trips. Depending on the ratio of the Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will decrease the latency by 25 to 50 percent. Experiments validate these predictions. [Goldberg, PETS 2010 rump session; see https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]
Can you describe the experiment some more?
I'm a bit puzzled by your "Results" graph. How many requests does it actually represent and what kind of request were used?
Design:
Currently, data arriving on the SOCKS connection to the OP on a stream in AP_CONN_STATE_CONNECT_WAIT is queued, and transmitted when the state transitions to AP_CONN_STATE_OPEN. Instead, when data arrives on the SOCKS connection to the OP on a stream in AP_CONN_STATE_CONNECT_WAIT (connection_edge_process_inbuf):
- Check to see whether optimistic data is allowed at all (see below).
- Check to see whether the exit node for this stream supports optimistic data (according to tor-spec.txt section 6.2, this means that the exit node's version number is at least 0.2.3.1-alpha). If you don't know the exit node's version number (because it's not in your hashtable of fingerprints, for example), assume it does *not* support optimistic data.
- If both are true, transmit the data on the stream.
Also, when a stream transitions *to* AP_CONN_STATE_CONNECT_WAIT (connection_ap_handshake_send_begin), do the above checks, and immediately send any already-queued data if they pass.
How much data is the SOCKS client allowed to send optimistically? I'm assuming there is a limit of how much data Tor will accept?
And if there is a limit, it would be useful to know if optimistically sending data is really worth it in situations where the HTTP request can't be optimistically sent as a whole.
While cutting down the time-to-first-byte for the HTTP request is always nice, in most situations the time-to-last-byte is more important as the HTTP server is unlikely to respond until the whole HTTP request has been received.
SOCKS clients (e.g. polipo) will also need to be patched to take advantage of optimistic data. The simplest solution would seem to be to just start sending data immediately after sending the SOCKS CONNECT command, without waiting for the SOCKS server reply. When the SOCKS client starts reading data back from the SOCKS server, it will first receive the SOCKS server reply, which may indicate success or failure. If success, it just continues reading the stream as normal. If failure, it does whatever it used to do when a SOCKS connection failed.
For a SOCKS client that happens to be a HTTP proxy, it can be easier to limit the support for "SOCKS with optimistic data" to "small" requests instead to support it for all. (At least it would be for Privoxy.)
For small requests it's (simplified):
1. Read the whole request from the client 2. Connect to SOCKS server/Deal with the response 3. Send the whole request 4. Read the response
As opposed to:
1. Read as much of the response as necessary to decide how to handle it (which usually translates to reading at least all the headers) 2. Connect to SOCKS server/Deal with the response 3. Send as much of the request as already known 4. Read some more of the client request 5. Send some more of the request to the server 6. Repeat steps 4 and 5 until the whole request has been sent or one of the connections is prematurely disconnected 7. Read the response
Implementing it for the latter case as well would be more work and given that most requests are small enough to be read completely before opening the SOCKS connections, the benefits may not be big enough to justify it.
I wouldn't be surprised if there's a difference for some browsers, too.
And even if there isn't, it may still be useful to only implement it for some requests to reduce the memory footprint of the local Tor process.
Security implications:
ORs (for sure the Exit, and possibly others, by watching the pattern of packets), as well as possibly end servers, will be able to tell that a particular client is using optimistic data. This of course has the potential to fingerprint clients, dividing the anonymity set.
If some clients only use optimistic data for certain requests it would divide the anonymity set some more, so maybe the proposal should make a suggestion and maybe Tor should even enforce a limit on the client side.
Performance and scalability notes:
OPs may queue a little more data, if the SOCKS client pushes it faster than the OP can write it out. But that's also true today after the SOCKS CONNECT returns success, right?
It's my impression that there's currently a limit of how much data Tor will read and buffer from the SOCKS client. Otherwise Tor could end up buffering the whole request, which could be rather large.
Fabian
On Sat, Jun 04, 2011 at 08:42:33PM +0200, Fabian Keil wrote:
Ian Goldberg iang@cs.uwaterloo.ca wrote:
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Me too, although 25 to 50 percent seem to be more of best case scenario and for some requests it's unlikely to make a difference.
The only requests for which it wouldn't make a difference are I think ones that can reuse an existing stream (that is, you've visited that same website recently enough that your browser is reusing an open TCP connection to the HTTP server, but not so recently (e.g. at the same time) that it's opening parallel connections). So I think most of the time, you'll see the benefit.
Filename: xxx-optimistic-data-client.txt Title: Optimistic Data for Tor: Client Side Author: Ian Goldberg Created: 2-Jun-2011 Status: Open
Overview:
This proposal (as well as its already-implemented sibling concerning the server side) aims to reduce the latency of HTTP requests in particular by allowing:
- SOCKS clients to optimistically send data before they are notified that the SOCKS connection has completed successfully
So it should mainly reduce the latence of HTTP requests that need a completely new circuit, right?
No, just ones that need a new TCP stream. The optimistic data stuff is about quickly sending data in just-being-constructed streams within an already-constructed circuit.
Do you have a rough estimate of what percentage of requests would actually be affected? I mean, how may HTTP requests that need a new circuit are there usually compared to requests that can reuse an already existing one (or even reuse the whole connection)?
Assuming you mean "stream" instead of "circuit" here, then, as above, I think most HTTP connections would be in this category. It might be interesting to examine some HTTP traces to see, though. <shoutout target="Kevin">Kevin, you were looking at some HTTP traces for other reasons, right? Anything in there that may help answer this question?</shoutout>
I'm aware that this depends on various factors, but I think even having an estimate that is only valid for a certain SOCKS client visiting a certain site would be useful.
I think overall across sites would be a better number, no?
Did you also measure the differences between requests that need a new circuit and requests that only need a new connection from the exit node to the destination server?
If there's a new connection from the exit node to the destination server, then there's a new stream, and you would see the full benefit of this proposal.
This change will save one OP<->Exit round trip (down to one from two). There are still two SOCKS Client<->OP round trips (negligible time) and two Exit<->Server round trips. Depending on the ratio of the Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will decrease the latency by 25 to 50 percent. Experiments validate these predictions. [Goldberg, PETS 2010 rump session; see https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]
Can you describe the experiment some more?
A webfetch client, using a single circuit, downloaded a web page from a fixed server (to eliminate variance due to different server RTTs and performance) 950 times. Each time, it randomly decided whether to use optimistic data (the "SOCKS 4b" line) or not (the "SOCKS 4a" line). The time from the start of the request (webfetch making the SOCKS connection) to the time the first data byte of the HTTP response ("HTTP/1.1 200 OK") arrived at webfetch was recorded, and the two CDFs of those values were plotted.
I'm a bit puzzled by your "Results" graph. How many requests does it actually represent and what kind of request were used?
As above, approximately 475 HTTP GET requests of each type. Note that the size of the fetched page is irrelevant to this measurement.
How much data is the SOCKS client allowed to send optimistically? I'm assuming there is a limit of how much data Tor will accept?
One stream window.
And if there is a limit, it would be useful to know if optimistically sending data is really worth it in situations where the HTTP request can't be optimistically sent as a whole.
I suspect it's rare that an HTTP request doesn't fit in one stream window (~250 KB).
While cutting down the time-to-first-byte for the HTTP request is always nice, in most situations the time-to-last-byte is more important as the HTTP server is unlikely to respond until the whole HTTP request has been received.
What? No, I think you misunderstand. The time-to-first-byte is the time until the first byte of the *response* is received back at the client. That's when the user's screen will start changing; previous work (does someone have a cite handy?) has indicated that if a page takes too long to start to change, users get frustrated with the slowness.
SOCKS clients (e.g. polipo) will also need to be patched to take advantage of optimistic data. The simplest solution would seem to be to just start sending data immediately after sending the SOCKS CONNECT command, without waiting for the SOCKS server reply. When the SOCKS client starts reading data back from the SOCKS server, it will first receive the SOCKS server reply, which may indicate success or failure. If success, it just continues reading the stream as normal. If failure, it does whatever it used to do when a SOCKS connection failed.
For a SOCKS client that happens to be a HTTP proxy, it can be easier to limit the support for "SOCKS with optimistic data" to "small" requests instead to support it for all. (At least it would be for Privoxy.)
For small requests it's (simplified):
- Read the whole request from the client
- Connect to SOCKS server/Deal with the response
- Send the whole request
- Read the response
As opposed to:
- Read as much of the response as necessary to decide how to handle it (which usually translates to reading at least all the headers)
- Connect to SOCKS server/Deal with the response
- Send as much of the request as already known
- Read some more of the client request
- Send some more of the request to the server
- Repeat steps 4 and 5 until the whole request has been sent or one of the connections is prematurely disconnected
- Read the response
Implementing it for the latter case as well would be more work and given that most requests are small enough to be read completely before opening the SOCKS connections, the benefits may not be big enough to justify it.
A reasonable proxy server (e.g. polipo, I'm pretty sure) streams data wherever possible. Certainly for responses: I seem to remember that privoxy indeed reads the whole response from the HTTP server before starting to send it to the web client, which adds a ton of extra delay in TTFB. I'm pretty sure polipo doesn't do that, but just streams the data as it arrives. Each program is likely to do the corresponding thing with the requests.
I wouldn't be surprised if there's a difference for some browsers, too.
How so? The browser sends the HTTP request to the proxy, and reads the response. What different behaviour might it have? The only one I can think of is "pipelining" requests, which some browsers/proxies/servers support and others don't. That is, if you've got 4 files to download from the same server, send the 4 HTTP requests on the same TCP stream before getting responses from any of them. In that case, you'll see the benefit for the first request in the stream, but not the others, since the stream will already be open.
And even if there isn't, it may still be useful to only implement it for some requests to reduce the memory footprint of the local Tor process.
Roger/Nick: is there indeed a limit to how much data from the SOCKS client the OP will queue up today before the stream is open?
Security implications:
ORs (for sure the Exit, and possibly others, by watching the pattern of packets), as well as possibly end servers, will be able to tell that a particular client is using optimistic data. This of course has the potential to fingerprint clients, dividing the anonymity set.
If some clients only use optimistic data for certain requests it would divide the anonymity set some more, so maybe the proposal should make a suggestion and maybe Tor should even enforce a limit on the client side.
I think the worse case is when most people are using optimistic data, but some older clients don't. They'll stand out more readily.
Performance and scalability notes:
OPs may queue a little more data, if the SOCKS client pushes it faster than the OP can write it out. But that's also true today after the SOCKS CONNECT returns success, right?
It's my impression that there's currently a limit of how much data Tor will read and buffer from the SOCKS client. Otherwise Tor could end up buffering the whole request, which could be rather large.
It would be. As above, is this actually true, though?
- Ian
Assuming you mean "stream" instead of "circuit" here, then, as above, I think most HTTP connections would be in this category. It might be interesting to examine some HTTP traces to see, though. <shoutout target="Kevin">Kevin, you were looking at some HTTP traces for other reasons, right? Anything in there that may help answer this question?</shoutout>
This is a great question.
Google released a study about a year ago [1] that characterized crawled web pages in terms of their total size, the number of distinct destination hosts per page, the number of HTTP GETs per page, and other attributes. Their data indicates that the median web page requires that the client connect to 5 distinct destination hosts and issue 6.25 GETs per host.
Put another way, a typical web page requires 5 streams, each of which issue multiple GETs. The full distributions of these statistics are available at [1].
Assuming persistent HTTP connections, with this proposal, only a stream's initial GET request would experience an improvement in time-to-first-byte, while subsequent GETs would be unaffected.
However, by reducing the time-to-first-byte for even just the first request per stream, the user is able to start fetching subsequent streams sooner, and thus, retrieve the whole page faster.
Kevin
Ian Goldberg iang@cs.uwaterloo.ca wrote:
On Sat, Jun 04, 2011 at 08:42:33PM +0200, Fabian Keil wrote:
Ian Goldberg iang@cs.uwaterloo.ca wrote:
Overview:
This proposal (as well as its already-implemented sibling concerning the server side) aims to reduce the latency of HTTP requests in particular by allowing:
- SOCKS clients to optimistically send data before they are notified that the SOCKS connection has completed successfully
So it should mainly reduce the latence of HTTP requests that need a completely new circuit, right?
No, just ones that need a new TCP stream. The optimistic data stuff is about quickly sending data in just-being-constructed streams within an already-constructed circuit.
I see.
Do you have a rough estimate of what percentage of requests would actually be affected? I mean, how may HTTP requests that need a new circuit are there usually compared to requests that can reuse an already existing one (or even reuse the whole connection)?
Assuming you mean "stream" instead of "circuit" here, then, as above, I think most HTTP connections would be in this category. It might be interesting to examine some HTTP traces to see, though. <shoutout target="Kevin">Kevin, you were looking at some HTTP traces for other reasons, right? Anything in there that may help answer this question?</shoutout>
I actually meant "how many HTTP requests that need a new circuit are there usually compared to requests that only need an new stream (or reuse the whole connection)?"
You've already written above that HTTP request that need a completely new circuit aren't affected anyway, so that leaves the requests that need a new stream and those that don't and I can get a rough idea about those myself by looking at my logs (your mileage is likely to vary of course):
fk@r500 ~ $privoxy-log-parser.pl --statistics /usr/jails/privoxy-jail/var/log/privoxy/privoxy.log.* Client requests total: 430598 [...] Outgoing requests: 300971 (69.90%) Server keep-alive offers: 156193 (36.27%) New outgoing connections: 237488 (55.15%) Reused connections: 63483 (14.74%; server offers accepted: 40.64%) Empty responses: 5244 (1.22%) Empty responses on new connections: 430 (0.10%) Empty responses on reused connections: 4814 (1.12%) [...]
I'm aware that this depends on various factors, but I think even having an estimate that is only valid for a certain SOCKS client visiting a certain site would be useful.
I think overall across sites would be a better number, no?
Sure.
How much data is the SOCKS client allowed to send optimistically? I'm assuming there is a limit of how much data Tor will accept?
One stream window.
And if there is a limit, it would be useful to know if optimistically sending data is really worth it in situations where the HTTP request can't be optimistically sent as a whole.
I suspect it's rare that an HTTP request doesn't fit in one stream window (~250 KB).
I agree, I expected the stream window to be a lot smaller.
While cutting down the time-to-first-byte for the HTTP request is always nice, in most situations the time-to-last-byte is more important as the HTTP server is unlikely to respond until the whole HTTP request has been received.
What? No, I think you misunderstand. The time-to-first-byte is the time until the first byte of the *response* is received back at the client.
Makes sense. Thanks for the clarification.
SOCKS clients (e.g. polipo) will also need to be patched to take advantage of optimistic data. The simplest solution would seem to be to just start sending data immediately after sending the SOCKS CONNECT command, without waiting for the SOCKS server reply. When the SOCKS client starts reading data back from the SOCKS server, it will first receive the SOCKS server reply, which may indicate success or failure. If success, it just continues reading the stream as normal. If failure, it does whatever it used to do when a SOCKS connection failed.
For a SOCKS client that happens to be a HTTP proxy, it can be easier to limit the support for "SOCKS with optimistic data" to "small" requests instead to support it for all. (At least it would be for Privoxy.)
For small requests it's (simplified):
- Read the whole request from the client
- Connect to SOCKS server/Deal with the response
- Send the whole request
- Read the response
As opposed to:
- Read as much of the response as necessary to decide how to handle it (which usually translates to reading at least all the headers)
- Connect to SOCKS server/Deal with the response
- Send as much of the request as already known
- Read some more of the client request
- Send some more of the request to the server
- Repeat steps 4 and 5 until the whole request has been sent or one of the connections is prematurely disconnected
- Read the response
Implementing it for the latter case as well would be more work and given that most requests are small enough to be read completely before opening the SOCKS connections, the benefits may not be big enough to justify it.
A reasonable proxy server (e.g. polipo, I'm pretty sure) streams data wherever possible.
Sure, but even polipo can't stream data the client didn't send yet and in case of requests larger than a few MTU sizes, for example file uploads, the SOCKS connection is probably established before the whole client request has been received.
Certainly for responses: I seem to remember that
privoxy indeed reads the whole response from the HTTP server before starting to send it to the web client, which adds a ton of extra delay in TTFB.
Privoxy buffers the whole response if it's configured to filter it (as the decision to modify the first response byte could depend on the last byte).
If no filters are enabled (this seems to be the case for the configuration Orbot uses), or no filters apply, the response data is forwarded to the client as it arrives.
I wouldn't be surprised if there's a difference for some browsers, too.
How so? The browser sends the HTTP request to the proxy, and reads the response. What different behaviour might it have? The only one I can think of is "pipelining" requests, which some browsers/proxies/servers support and others don't. That is, if you've got 4 files to download from the same server, send the 4 HTTP requests on the same TCP stream before getting responses from any of them. In that case, you'll see the benefit for the first request in the stream, but not the others, since the stream will already be open.
I was thinking about file uploads. Currently it's not necessary for the client to read the whole file before the SOCKS connection is even established, but it would be, to optimistically send it (or at least the first part of it).
Even old browsers support file uploads up to ~2GB, so this would also be a case where the request might be too large to fit in the stream window.
While file uploads are certainly rare (and successfully pushing 2GB through Tor might be a challenge), it might be worth thinking about how to handle them anyway. Letting the Tor client cache the whole file is probably not the best solution above a certain file size.
I also thought about another case where it's not obvious to me what to do: a HTTPS connection made by a browser through a HTTP proxy.
Currently the browser will not start sending data for the server until the proxy has signaled that the connection has been established, which the proxy doesn't know until told so by the SOCKS server.
If the HTTP proxy "optimistically lies" to the client it will not be able to send a proper error message if the SOCKS connection actually can't be established. Of course this only matters if the client does something useful with the error message, and at least Firefox stopped doing that a while ago.
The impact on SSL connections is probably less significant anyway, though.
Thanks a lot for the detailed response.
Fabian
On Thu, Jun 2, 2011 at 8:45 PM, Ian Goldberg iang@cs.uwaterloo.ca wrote:
Sorry this took so long. As usual, things got inserted ahead of it in the priority queue. :-p
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Added as proposal 181.
I'm a little worried about the robustness issue: currently, if an exit node refuses a BEGIN request (because of its exit policy typically) the Tor client will retry at another exit node. But if optimistic data is in use, it seems that the client's initial data will be lost, unless the client keeps a copy around to send to other exits as required.
As for the application support matter, I wonder how hard it will be to actually get support. We're trying to phase out HTTP proxies in our bundles, so it seems we'd need to tweak browsers to send optimistically.
On Sat, Jun 04, 2011 at 11:15:53PM -0400, Nick Mathewson wrote:
On Thu, Jun 2, 2011 at 8:45 PM, Ian Goldberg iang@cs.uwaterloo.ca wrote:
Sorry this took so long. As usual, things got inserted ahead of it in the priority queue. :-p
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Added as proposal 181.
I'm a little worried about the robustness issue: currently, if an exit node refuses a BEGIN request (because of its exit policy typically) the Tor client will retry at another exit node. But if optimistic data is in use, it seems that the client's initial data will be lost, unless the client keeps a copy around to send to other exits as required.
That's a good point. Perhaps the latter is the right thing to do? That would be sort of a combination of what we do now and the above proposal: buffer the data (as we do now), but also send it (as the proposal). When you eventually receive the CONNECTED, flush anything in the buffer you've already sent. If you eventually receive END instead of CONNECTED, try another circuit, using the buffered data?
As for the application support matter, I wonder how hard it will be to actually get support. We're trying to phase out HTTP proxies in our bundles, so it seems we'd need to tweak browsers to send optimistically.
Don't we already modify the browser in the bundle? You need _something_ (either modifications to the browser or a privacy-aware proxy) to remove any application-level privacy leaks that might exist, right?
- Ian
On Sun, Jun 5, 2011 at 8:29 AM, Ian Goldberg iang@cs.uwaterloo.ca wrote:
On Sat, Jun 04, 2011 at 11:15:53PM -0400, Nick Mathewson wrote:
On Thu, Jun 2, 2011 at 8:45 PM, Ian Goldberg iang@cs.uwaterloo.ca wrote:
Sorry this took so long. As usual, things got inserted ahead of it in the priority queue. :-p
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Added as proposal 181.
I'm a little worried about the robustness issue: currently, if an exit node refuses a BEGIN request (because of its exit policy typically) the Tor client will retry at another exit node. But if optimistic data is in use, it seems that the client's initial data will be lost, unless the client keeps a copy around to send to other exits as required.
That's a good point. Perhaps the latter is the right thing to do? That would be sort of a combination of what we do now and the above proposal: buffer the data (as we do now), but also send it (as the proposal). When you eventually receive the CONNECTED, flush anything in the buffer you've already sent. If you eventually receive END instead of CONNECTED, try another circuit, using the buffered data?
Maybe! It seems plausible to me, though we should definitely ponder the security/performance implications.
As for the application support matter, I wonder how hard it will be to actually get support. We're trying to phase out HTTP proxies in our bundles, so it seems we'd need to tweak browsers to send optimistically.
Don't we already modify the browser in the bundle? You need _something_ (either modifications to the browser or a privacy-aware proxy) to remove any application-level privacy leaks that might exist, right?
Yeah. Actually, handling application-level privacy leaks *can't* be done with a regular proxy, unless the proxy gets to MITM ssl connections, which probably isn't a good idea.
All I'm saying here, though, is that I'm wondering how hard the change will be to actually make. Most socks client code tends to get isolated in an application's network layer as a replacement for "connect", so unless the application is already set up to do "connect, send send send" rather than "connect, wait for connect, send send send", the application modifications will be tricky.
As an alternative, the socks proxy (Tor) could be told to say "connected" immediately, so that the app starts sending. I don't know how badly this would break browsers, though. Probably not a good idea.
yrs,
On Wed, Jun 08, 2011 at 05:51:41PM -0400, Nick Mathewson wrote:
I'm a little worried about the robustness issue: currently, if an exit node refuses a BEGIN request (because of its exit policy typically) the Tor client will retry at another exit node. But if optimistic data is in use, it seems that the client's initial data will be lost, unless the client keeps a copy around to send to other exits as required.
That's a good point. Perhaps the latter is the right thing to do? That would be sort of a combination of what we do now and the above proposal: buffer the data (as we do now), but also send it (as the proposal). When you eventually receive the CONNECTED, flush anything in the buffer you've already sent. If you eventually receive END instead of CONNECTED, try another circuit, using the buffered data?
Maybe! It seems plausible to me, though we should definitely ponder the security/performance implications.
Indeed. They don't seem bad at first glance, at least.
All I'm saying here, though, is that I'm wondering how hard the change will be to actually make. Most socks client code tends to get isolated in an application's network layer as a replacement for "connect", so unless the application is already set up to do "connect, send send send" rather than "connect, wait for connect, send send send", the application modifications will be tricky.
Right. So it turns out this is a case where using an HTTP proxy makes things easier. Hasn't some Tor person been fiddling with Firefox code? Maybe even Firefox SOCKS code?
As an alternative, the socks proxy (Tor) could be told to say "connected" immediately, so that the app starts sending. I don't know how badly this would break browsers, though. Probably not a good idea.
You also lose the ability to tell the SOCKS client if the connection ended up failing.
- Ian
Ian Goldberg iang@cs.uwaterloo.ca wrote:
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
After patching my client to support this (for most requests) I'd be interested to know how much this actually improves things in the real world.
Does anyone already have a script that reads a Tor log file and generates statistics that answer this question?
Fabian