Sharif Olorin:
I would expect most US universities to be logging netflow in the very least. Even if the Tor operator isn't keeping logs, it seems safe to assume the network operator is.
I'd be surprised if it was different for non-US universities - I'd expect this to be the case for every university with its own AS, and probably most without. It's not specific to universities either; it would be a rare ISP that doesn't retain netflow for traffic accounting purposes. It's often somewhat aggregated, but to varying degrees - the last such system I worked on was designed to retain indefinitely at sub-minute granularity for training/crossvalidation of network anomaly detection.
Green & Sharif (& any others with direct netflow experience) -
At what resolution is this type of netflow data typically captured?
Are we talking about all connection 5-tuples, bidirectional/total transfer byte totals, and open and close timestamps, or more (or less) detail than this?
Are timestamps always included? Are bidirectional transfer bytecounts always included? Are subsampled packet headers (or contents) sometimes/often included?
What about UDP sessions? IPv6?
I think for various reasons (including this one), we're soon going to want some degree of padding traffic on the Tor network at some point relatively soon, and having more information about what is typically recorded in these cases would be very useful to inform how we might want to design padding and connection usage against this and other issues.
Information about how UDP is treated would also be useful if/when we manage to switch to a UDP transport protocol, independent of any padding.
Thanks a bunch!