Hello everyone,
This is the second status report of my Google Summer of Code project, which is to implement consensus diffs for Tor. We hold weekly meetings every Wednesday at 14h UTC with the project mentors Nick and Sebastian.
When I sent my first report I was a bit behind schedule as per my own timeline, mainly due to finals. These past two weeks I've been catching up on that timeline. I am now finishing the first part of it, which is to implement the diff generation code.
If you read the last report, you might remember that the biggest issue at the time was performance. The code I had worked, but it was very pricy due to the quadratic time it took to compute a consensus diff, translating in running times from 10 to 20 seconds. In that e-mail I explained an idea I had in mind to overcome that.
It got implemented and tested, and it works as expected. Now a diff generation between two consensus files takes roughly 0.04 seconds, which is almost the same amount of time that 'diff -e' takes. No precision seems to have been lost, but we'll have to do some edge case tests to verify that.
The code was also missing documentation and some tor_asserts to make it more robust and understandable. That is what I'm finishing up at the moment, and then the tests will be the only task left.
On to the second timeline item, which is the diff application code - The code sample I wrote for my application was a very simple version of it, supporting only 'delete' ed commands and just writing to stdout line-by-line. Taking that code as a starting point, plus being able to re-use some chunks from the diff generation code, I was able to have a working version in a few hours.
So, like the diff generation bit, now that it seems to work properly it's time to document it properly and do some tests. I have until July 5th to finish these two tasks as per my own timeline, so I should be well able to have them done by then.
The upcoming timeline item will be the joining of these chunks of code with Tor, at which point I expect some issues to arise. So I would like to get started with that bit the sooner the better, to have more time to get everything working and tested before we run out of time.
As usual, any comments or ideas are very welcome.