While experimenting with another tunnel built on KCP and smux, I discovered that performance could be greatly increased by increasing the size of smux buffers. It's likely that doing the same can also improve performance in Snowflake.
There are two relevant parameters, MaxReceiveBuffer and MaxStreamBuffer. MaxStreamBuffer seems to be the most important one to increase.
https://pkg.go.dev/github.com/xtaci/smux#Config // MaxReceiveBuffer is used to control the maximum // number of data in the buffer pool MaxReceiveBuffer int
// MaxStreamBuffer is used to control the maximum // number of data per stream MaxStreamBuffer int
The default values are 4 MB and 64 KB. https://github.com/xtaci/smux/blob/eba6ee1d2a14eb7f86f43f7b7cb3e44234e13c66/... MaxReceiveBuffer: 4194304, MaxStreamBuffer: 65536,
kcptun, a prominent KCP/smux tunnel, has defaults of 4 MB (--smuxbuf) and 2 MB (--streambuf) in both client and server: https://github.com/xtaci/kcptun/blob/9a5b31b4706aba4c67bcb6ebfe108fdb564a905...
In my experiment, I changed the values to 4 MB / 1 MB on the client and 16 MB / 1 MB on the server. This change increased download speed by about a factor of 3: default buffers 477.4 KB/s enlarged buffers 1388.3 KB/2 Values of MaxStreamBuffer higher than 1 MB didn't seem to have much of an effect. 256 KB did not help as much.
My guess, based on intuition, is that on the server we should set a large value of MaxReceiveBuffer, as it is a global limit shared among all clients, and a relatively smaller value of MaxStreamBuffer, because there are expected to be many simultaneous streams. On the client, don't set MaxReceiveBuffer too high, because it's on an end-user device, but go ahead and set MaxStreamBuffer high, because there's expected to be only one or two streams at a time.
I discovered this initially by temporarily settings smux to protocol v1 instead of v2. My understanding is that v1 lacks some kind of receive window mechanism that v2 has, and by default is more willing to expend memory receiving data. See "Per-stream sliding window to control congestion.(protocol version 2+)": https://pkg.go.dev/github.com/xtaci/smux#readme-features
Past performance ticket: "Reduce KCP bottlenecks for Snowflake" https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...