While experimenting with another tunnel built on KCP and smux, I
discovered that performance could be greatly increased by increasing the
size of smux buffers. It's likely that doing the same can also improve
performance in Snowflake.
There are two relevant parameters, MaxReceiveBuffer and MaxStreamBuffer.
MaxStreamBuffer seems to be the most important one to increase.
https://pkg.go.dev/github.com/xtaci/smux#Config
// MaxReceiveBuffer is used to control the maximum
// number of data in the buffer pool
MaxReceiveBuffer int
// MaxStreamBuffer is used to control the maximum
// number of data per stream
MaxStreamBuffer int
The default values are 4 MB and 64 KB.
https://github.com/xtaci/smux/blob/eba6ee1d2a14eb7f86f43f7b7cb3e44234e13c66…
MaxReceiveBuffer: 4194304,
MaxStreamBuffer: 65536,
kcptun, a prominent KCP/smux tunnel, has defaults of 4 MB (--smuxbuf)
and 2 MB (--streambuf) in both client and server:
https://github.com/xtaci/kcptun/blob/9a5b31b4706aba4c67bcb6ebfe108fdb564a90…
In my experiment, I changed the values to 4 MB / 1 MB on the client and
16 MB / 1 MB on the server. This change increased download speed by
about a factor of 3:
default buffers 477.4 KB/s
enlarged buffers 1388.3 KB/2
Values of MaxStreamBuffer higher than 1 MB didn't seem to have much of
an effect. 256 KB did not help as much.
My guess, based on intuition, is that on the server we should set a
large value of MaxReceiveBuffer, as it is a global limit shared among
all clients, and a relatively smaller value of MaxStreamBuffer, because
there are expected to be many simultaneous streams. On the client, don't
set MaxReceiveBuffer too high, because it's on an end-user device, but
go ahead and set MaxStreamBuffer high, because there's expected to be
only one or two streams at a time.
I discovered this initially by temporarily settings smux to protocol v1
instead of v2. My understanding is that v1 lacks some kind of receive
window mechanism that v2 has, and by default is more willing to expend
memory receiving data. See "Per-stream sliding window to control
congestion.(protocol version 2+)":
https://pkg.go.dev/github.com/xtaci/smux#readme-features
Past performance ticket: "Reduce KCP bottlenecks for Snowflake"
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snow…
I was looking at https://snowflake-broker.torproject.net/debug just now,
and saw:
current snowflakes available: 317
standalone proxies: 216
browser proxies: 0
webext proxies: 101
unknown proxies: 0
NAT Types available:
restricted: 278
unrestricted: 2
unknown: 37
About 2/3 of proxies are standalone, which is more than I would have
supposed. Has there been word getting out about how to run one, or
something?