Hello,
So I've been somewhat productive as of late and have been working on the successor to obfs4. I have a "oh my god, you wrote how much code, with no documentation" minimum-viable-product-ish release that appears to work, though ABSOLUTELY NO ONE SHOULD USE IT YET, because I will break bridge line/protocol wire compatibility intentionally at least once before release.
The code: https://git.schwanenlied.me/yawning/basket2 (Really, don't use it, at all. I will ignore cries of help, mercilessly mock people that attempt to chase a moving target, and laugh as I break things).
The documentation: Hahahahahahahahahahahahahahaha (It should be self explanatory).
A brief overview of the improvements:
* Client can now negotiate the link padding/delay algorithm at handshake time with the server (no more `iat-mode=blah` argument), and defaults to something which injects delay.
* Link cryptography changed to a XChaCha20/Poly1305 construct, with AVX2 used for the ChaCha20 when available, and the framing construct it self is a lot better designed.
* Handshake is X25519 + NewHope or X448 + NewHope based (available algorithms specified as part of the bridge line), with X25519 (Elligator2) for authentication.
* SHA-3/SHAKE are used as the handshake routines.
* When adding IAT on a burst basis, the code attempts to detect bulk data transfer resulting in less delay for large downloads/uploads.
* Server side buffering reduced by 50% (not sure if that'll stick, I need to think about some of this stuff).
TODO:
* Implement a "expensive" padding option for bridges that can afford to run such a thing. (WTF-Pad, CS-BuFlo, or Tamaraw, not sure yet. WTF-Pad is a bit harder to implement than I'd like...)
* Improve the lightweight padding algorithms instead of using what is essentially the same as obfs
* Finish support for user authentication.
* Write a formal specification.
* Debug, debug, debug.
* Write an AVX2 Poly1305 so it goes faster on boxes that matter (my laptop).
* (Maybe) Write NEON and 32 bit x86 assembly optimized versions of the custom crypto, so it runs faster on various low end shit boxes. NEON is more important to me than 32 bit Intel (and a low end Atom does ChaCha20 at 40 MiB/s or so anyway...).
Anyway, this is sort of a heads up that something like this is in the works, and is approaching alpha state, though DID I MENTION NOT TO USE IT YET?
Questions/Comments/Feedback welcome as always.
Regards,