[Stegotorus] Fundamental problem with ack/retransmission mechanism

List overview All Threads
Download

newer

older

Re: [tor-dev] Gitian builds in...

Standalone flash proxy

vmonmoonshine＠gmail.com

18 Apr 2013 18 Apr '13

7:49 p.m.

Hello Zack,

In testing the retransmission algorithm, I encounter situations that the ack/retransmission algorithm is unable to resolve by design. It happens when the header of a packet gets deliver but the middle part gets dropped. For example in the following scenario

Server sends packet 1 len 4k packet 1 gets lost Server sends packet 2 len 64k the last 60kb of packet 2 gets lost client receives packet 2 partially (4k) and waits for the 60k to come. client sends ack to the server saying that packet one is lost server retransmits pack 1 client consider packet 1 as a part of packet 2, and thinks that now it has 8k of packet 2 client again sends ack to the server saying that packet one is lost server retransmits pack 1 client thinks now it has 12K of packet 2 client again sends ack to the server saying that packet one is lost server retransmits pack 1 client thinks now it has 16k of packet 2 etc

Soon the transmit queue on the client side will get filled up by the ack packets, cause the client won't realize its mistake before it receives 64k of garbage and be able to compute the mac. Hence the server and the clients will remain in this deadlock for ever.

Originally, Stegotorus was dropping any circuit with full transmit queue. But this was causing lots of problem because lots of time the client lags behind the server for a while to process all the packet and sends ack.

If ack/retransmit by design wasn't meant to recover from partial packet lost and was only a defense for complete packet lost, then the dropper proxy isn't doing a good job simulating the intended threat model.

Otherwise, if the ack/retransmit was meant to withstand partial packet drop, then we need to redesign it so it doesn't get trapped in such a deadlock. Maybe we should have a timer so the circuits with full transmit queue gets a grace period before getting dropped.

This is not specific to chop protocol. I'm wondering how TCP is dealing with such a situation.

Thanks, Vmon

Show replies by date

Zack Weinberg

18 Apr 18 Apr

8:38 p.m.

New subject: [Stegotorus] Fundamental problem with ack/retransmission mechanism

On Thu, Apr 18, 2013 at 3:49 PM, vmonmoonshine@gmail.com wrote:

...

In testing the retransmission algorithm, I encounter situations that the ack/retransmission algorithm is unable to resolve by design. It happens when the header of a packet gets deliver but the middle part gets dropped. For example in the following scenario

Server sends packet 1 len 4k packet 1 gets lost Server sends packet 2 len 64k the last 60kb of packet 2 gets lost

This isn't supposed to be possible. The steg-in-use (whichever one it is) is supposed to ask the chopper for blocks that are small enough that they will either be delivered entirely or not at all.

That isn't supposed to mean that we have to limit ourselves to blocks smaller than the MTU, because TCP is supposed to deliver entire streams reliably. In your example, "the last 60kb of packet 2" ought to be delivered by TCP before anything else can arrive on that connection, and the retransmit of packet 1 ought to be happening on a *different* connection, if we have one. (Which steg are you using?)

...

If ack/retransmit by design wasn't meant to recover from partial packet lost and was only a defense for complete packet lost, then the dropper proxy isn't doing a good job simulating the intended threat model.

So I'm not understanding exactly what the "dropper proxy" does. Does it prevent TCP from providing reliable delivery? If so, how?

I'm not saying you're wrong, btw. The ack/retransmit design happened in a tearing hurry at a dog-and-pony show last year, in response to terrible network conditions at the venue (we were seeing something like one out of five IP packets just disappear), and we (me and Vinod) *thought* we had it right, but we never did finish debugging, and it's entirely possible that it doesn't work.

TCP *did* cope with the terrible network, it was just stegotorus that didn't, and honestly I'm not sure the ack-retransmit notion was the right way to go. It was never 100% clear to me what was happening to the connections that froze up. ST absolutely *does* need a congestion control mechanism, though, to prevent the entire circuit from getting killed because it overran the fixed-size reassembly queue, and as long as we have to do that ...

vmonmoonshine＠gmail.com

9:08 p.m.

New subject: [Stegotorus] Fundamental problem with ack/retransmission mechanism

Hey Zack,

Thanks for replying so fast.

Zack Weinberg zackw@panix.com writes:

...

and the retransmit of packet 1 ought to be happening on a *different* connection, if we have one. (Which steg are you using?)

I'm using nosteg steg. I thought if something is going to work, better, it works with the simplest steg. nosteg only open one connection and pass everything through it as long as there is no reason to drop it, for bad header etc.

...

So I'm not understanding exactly what the "dropper proxy" does. Does it prevent TCP from providing reliable delivery? If so, how?

It sits at socket_read_cb and sometimes doesn't copy what libevent's has read from one side of communication, into the buffer of the other side. So, what you say, means that libevents socket_read_cb calls are more refined than an entire TCP packet that TCP guaranteed to deliver. I.e, libevent break something that TCP guaranteed to deliver in smaller parts.

If this is true, I guess because TCP is giving us a stream there is no way for us to know where to drop to have a legitimate simulation of real life packet drop in that stream, unless I incorporate part of chop in the dropper proxy to read their headers and detect end of packet.

...

ST absolutely *does* need a congestion control mechanism, though, to prevent the entire circuit from getting killed because it overran the fixed-size reassembly queue, and as long as we have to do that ...

Then maybe I just start an axe timer whenever when I want to send and the transmit queue is full and delete the timer whenever the queue moves.

Cheers, Vmon

Zack Weinberg

19 Apr 19 Apr

6:03 p.m.

New subject: [Stegotorus] Fundamental problem with ack/retransmission mechanism

On Thu, Apr 18, 2013 at 5:08 PM, vmonmoonshine@gmail.com wrote:

...

Zack Weinberg zackw@panix.com writes:

...
and the retransmit of packet 1 ought to be happening on a *different* connection, if we have one. (Which steg are you using?)

I'm using nosteg steg. I thought if something is going to work, better, it works with the simplest steg. nosteg only open one connection and pass everything through it as long as there is no reason to drop it, for bad header etc.

OK. We shouldn't even try to retransmit with a 1-connection steg mode (unless it's not using TCP ... worry about that later)

...

...
So I'm not understanding exactly what the "dropper proxy" does. Does it prevent TCP from providing reliable delivery? If so, how?

It sits at socket_read_cb and sometimes doesn't copy what libevent's has read from one side of communication, into the buffer of the other side.

Ah. Yeah, that's not going to work. TCP will think that the data _has_ been delivered, so the lower-level retransmit that we're relying on will never happen.

...

So, what you say, means that libevents socket_read_cb calls are more refined than an entire TCP packet that TCP guaranteed to deliver. I.e, libevent break something that TCP guaranteed to deliver in smaller parts.

TCP is a stream-oriented protocol. It guarantees to provide reliable, ordered delivery of a _sequence of bytes_. It does _not_ guarantee anything whatsoever about packet boundaries. In particular, the amount of data libevent hands you in one read callback is completely meaningless.

...

If this is true, I guess because TCP is giving us a stream there is no way for us to know where to drop to have a legitimate simulation of real life packet drop in that stream, unless I incorporate part of chop in the dropper proxy to read their headers and detect end of packet.

What you need to do is implement the dropping _below_ TCP, so that TCP is aware of it and does do its retransmits. You can do this on Linux with netem http://www.linuxfoundation.org/collaborate/workgroups/networking/netem and on many of the *BSDs (including OSX) with dummynet http://info.iet.unipi.it/~luigi/dummynet/.

...

...
ST absolutely *does* need a congestion control mechanism, though, to prevent the entire circuit from getting killed because it overran the fixed-size reassembly queue, and as long as we have to do that ...

Then maybe I just start an axe timer whenever when I want to send and the transmit queue is full and delete the timer whenever the queue moves.

How do you know that your peer's receive queue is full, if it doesn't tell you?

But we probably could get away with something much simpler if we don't bother doing retransmits, e.g. something akin to Tor's SENDME cells.

Vmon

18 Jun 18 Jun

9:52 a.m.

New subject: [Stegotorus] Fundamental problem with ack/retransmission mechanism

Hey Zack,

I'm not sure if you were following Iranian filtering few days leading to the election. It was basically http white-list. Psipohn was sending few 'GET / HTTP 1.1' before start sending any real data and it was able to fool the box . But the filtering is going to get more intelligent next time, and hence I feel stegotorus is more important than ever. I couldn't test stegotorus, cause SSH wasn't working and I couldn't access my vps there :(

My time freed up a bit and I thought we should deal with the ack design problem once and forever.

The fundamental thing that is not clear for me is that "If we have the assumption that TCP is reliable, this means all packets are going to arrive sooner or later. In such situation what's the need for re-transmission".

The only scenario I can imagine where retransmit is useful is the following: Re-assembly queue has 255 rooms. If a packet is delayed so much that misses 255 packets after it then we are not able to reconstruct the information. Otherwise if our re-assembly queue had infinite length, at least theoretically we didn't need to have ack/retransmission at all cause TCP reassurance was enough for us. This situation is imaginable only if we have more than one connection otherwise TCP would assure also the order in which our pseudo-packets are arriving.

Is that statement correct? If it is correct, then we can modify the design to deal with above situation instead of being a full-fledged ack mechanism. That will save lots of traffic instead of blindly retransmitting big packets all the time.

Thanks for helping me with this.

Best, Vmon

Zack Weinberg zackw@panix.com writes:

...

On Thu, Apr 18, 2013 at 5:08 PM, vmonmoonshine@gmail.com wrote:

...
Zack Weinberg zackw@panix.com writes:

...
and the retransmit of packet 1 ought to be happening on a *different* connection, if we have one. (Which steg are you using?)

I'm using nosteg steg. I thought if something is going to work, better, it works with the simplest steg. nosteg only open one connection and pass everything through it as long as there is no reason to drop it, for bad header etc.

OK. We shouldn't even try to retransmit with a 1-connection steg mode (unless it's not using TCP ... worry about that later)

...
...
So I'm not understanding exactly what the "dropper proxy" does. Does it prevent TCP from providing reliable delivery? If so, how?

It sits at socket_read_cb and sometimes doesn't copy what libevent's has read from one side of communication, into the buffer of the other side.

Ah. Yeah, that's not going to work. TCP will think that the data _has_ been delivered, so the lower-level retransmit that we're relying on will never happen.

...
So, what you say, means that libevents socket_read_cb calls are more refined than an entire TCP packet that TCP guaranteed to deliver. I.e, libevent break something that TCP guaranteed to deliver in smaller parts.

TCP is a stream-oriented protocol. It guarantees to provide reliable, ordered delivery of a _sequence of bytes_. It does _not_ guarantee anything whatsoever about packet boundaries. In particular, the amount of data libevent hands you in one read callback is completely meaningless.

...
If this is true, I guess because TCP is giving us a stream there is no way for us to know where to drop to have a legitimate simulation of real life packet drop in that stream, unless I incorporate part of chop in the dropper proxy to read their headers and detect end of packet.

What you need to do is implement the dropping _below_ TCP, so that TCP is aware of it and does do its retransmits. You can do this on Linux with netem http://www.linuxfoundation.org/collaborate/workgroups/networking/netem and on many of the *BSDs (including OSX) with dummynet http://info.iet.unipi.it/~luigi/dummynet/.

...
...
ST absolutely *does* need a congestion control mechanism, though, to prevent the entire circuit from getting killed because it overran the fixed-size reassembly queue, and as long as we have to do that ...

Then maybe I just start an axe timer whenever when I want to send and the transmit queue is full and delete the timer whenever the queue moves.

How do you know that your peer's receive queue is full, if it doesn't tell you?

But we probably could get away with something much simpler if we don't bother doing retransmits, e.g. something akin to Tor's SENDME cells.

zw

Jeroen Massar

1:45 p.m.

New subject: [Stegotorus] Fundamental problem with ack/retransmission mechanism

On 2013-06-18 11:52 , Vmon wrote:

...

Hey Zack,

I'm not sure if you were following Iranian filtering few days leading to the election. It was basically http white-list. Psipohn was sending few 'GET / HTTP 1.1' before start sending any real data and it was able to fool the box . But the filtering is going to get more intelligent next time, and hence I feel stegotorus is more important than ever.

A little birdy whispered in my ear that a much revised version with a lot of new features and various fixes including a lot of ack/retransmit fixes should come available soon, but it is pending $org review... that version even works fine on Windows btw.

...

The fundamental thing that is not clear for me is that "If we have the assumption that TCP is reliable, this means all packets are going to arrive sooner or later. In such situation what's the need for re-transmission".

The problem is not about TCP, but more about HTTP where a censor or a rate-limitter in general (eg Hotel WiFi tends to have issues) does not always answer every HTTP request and thus one loses packets. Retransmit is also important in those cases.

Greets, Jeroen

Vmon

24 Jun 24 Jun

3:04 a.m.

New subject: [Stegotorus] Fundamental problem with ack/retransmission mechanism

Hey Jeroen,

Thank you and thanks to the little birdy :)

Jeroen Massar jeroen@massar.ch writes:

...

A little birdy whispered in my ear that a much revised version with a lot of new features and various fixes including a lot of ack/retransmit fixes should come available soon, but it is pending $org review... that version even works fine on Windows btw.

That is a great news. Does the little birdy also has some news about the encryption and the handshake, as far as I see in the current source it is symmetric with hardcoded key while the original design was calling for a double elliptic curve system ( a curve and its twist). I think that was a dead important priority.

...

The problem is not about TCP, but more about HTTP where a censor or a rate-limitter in general (eg Hotel WiFi tends to have issues) dnnnnnoes not always answer every HTTP request and thus one loses packets. Retransmit is also important in those cases.

That makes the situation more clear and somehow now. When I told Zack that retransmission on single connection has problem (it gets stuck) he told me that he's not sure if we shouldn't do retransmission when we only have single connection (single TCP connection, like the nosteg module). Clearly in even such a situation still retransmit is useful if we are fearing proxy intervention.

I'm looking forward to see the new release. Meanwhile, these are some speed test I did one day after election in Iran (unfortunately ssh was blacklisted before the election and I didn't have access to my box to run the tests):

connection: dn/up (mbps) No proxy: 2.09/2.08 Psiphon: 0.76/0.2 StegoTorus nosteg: 2.04/0.46* StegoTorus http: 0.17/0.07

Thanks, Vmon

*(obviously this is because of upload limit on my machine)

4154

Age (days ago)

4221

Last active (days ago)

tor-dev@lists.torproject.org

6 comments

4 participants

tags (0)

participants (4)

Jeroen Massar
Vmon
vmonmoonshine＠gmail.com
Zack Weinberg