Stem seems like a good choice and something I've been meaning to get back in to. I've been using the old Python library. I'd be happy to work on this project and get up to speed with what Mike wrote a while back.
Great! Just let me know if you have any stem questions.
I've got to go through the old SOAT code, but can anyone tell me a structural reason not to begin by re-implementing his original tests (DNS manipulations, http traffic tampering, etc)?
Either porting SoaT or reimplementing the same tests to initially aim for feature parity would be a fine place to start. I'd suggest taking a look at its codebase and deciding for yourself which would be better.
Cheers! -Damian