On Sat, 22 Aug 2015 14:40:08 -0700 Kevin P Dyer kpdyer@gmail.com wrote:
Ah, gotcha. It's not RFC compliant. RFC2616 was created in 1999 and there are tons of HTTP-like implementations since then that, ostensibly, don't need to follow it. (e.g., an HTTP-like client/server that only talk to each other.) A network monitor must deal with these cases too, and they'll broadcast HTTP/1.1 in their headers.
This [1] paper is a bit dated (2007) but my intuition is that real-world implementations have drifted even further from the RFC over the last 8 years. I swear there's a more recent paper on this topic, but I couldn't find it...
I'd be surprised if there were lots of clients that advertise HTTP/1.1 that don't include a Host header, since clients that are broken in that manner will not be able to talk to apache/ngnix/tomcat/etc[0].
Then again, fteproxy is an example of such a thing, so I may be rather sad at the results of an actual survey.
Since requests of that sort should invoke the error path on RFC compliant servers it's a really good distinguisher since legitimate clients will not do such a thing. Existing realistic adversaries already have "identify 'suspicious behavior', call back to confirm" style filtering in production, so false positive rate can be reduce to 0 if needed.
Based on our exploration of data, we found there's a wide range of implementations and most of which have non-RFC-compliant behaviors. See Section 4 of our paper for more details. For that reason I'd be very surprised if a host-header-check could result in a 0 FP rate.
The point isn't to use non-compliance as the sole discriminator (since people do write broken code), but to cut down the candidate IP/Port list down to something that's reasonable for whatever active probing infrastructure that exists to manage.
From there, delta-T later with separate infrastructure attempt a full FTE + Tor handshake, and blacklist/RST inject/etc target candidates that succeed.
The second step gets to 0 FP, and precisely this sort of thing is how China currently handles obfs3. The delay (anecdotal) is about 10 mins.
Intuitively I think that "missing Host header" will be extremely rare but I don't have a way to get traces to prove/disprove it.
With that being said, I'll add the host-header-check to the list of experiments that we want to do for the full version of our paper. Would be interesting to learn what the data tells us.
I would be interested in seeing the results.