grarpamp:
On Wed, Aug 12, 2015 at 7:45 PM, Mike Perry mikeperry@torproject.org wrote:
At what resolution is this type of netflow data typically captured?
Are we talking about all connection 5-tuples, bidirectional/total transfer byte totals, and open and close timestamps, or more (or less) detail than this?
All of the above depends on which flow export version / aggregation you choose, until you get to v9 and IPFIX, for which you can define your fields. In short... yes.
But consider looking at average flow lifetimes on the internet. There may be case for going longer, bundling or turfing across a range of ports to falsely trigger a record / bloat, packet switching and so forth.
This interests me, but we need more details to determine what this looks like in practice.
I suspect that this is one case where the switch to one guard may have helped us. However, Tor still closes the TCP connection after just one hour of inactivity. What if we kept it open longer? Or what if the first hop was an encrypted UDP-based PT, where it was not clear if the session was torn down or closed?
recorded in these cases would be very useful to inform how we might want to design padding and connection usage against this and other issues.
"Typical" is really defined by the use case of whoever needs the flows, be it provisioning, engineering, security, operations, billing, bigdata, etc. And only limited by the available formats, storage, postprocessing, and customization. IPFIX and
"Typically", I appreciate your answers grarpamp. They're "typically" correct, but sometimes they have more flavor than I'm looking for, and in this case I am worried it may end up silencing the people I'd really like to hear from. I want real data from the field, here. Not speculation on what is possible.
I think for various reasons (including this one), we're soon going to want some degree of padding traffic on the Tor network at some point relatively soon
Really? I can haz cake nao? Or only after I pump in this 3k email and watch 3k come out the other side to someone otherwise idling ;)
You can say that, but then why isn't this being done in the real world? The Snowden leaks seem to indicate exploitation is the weapon of choice.
I suspect other factors are at work that prevent dragnet correlation from being reliable, in addition to the economics of exploits today (which may be subject to change). These factors are worth investigating in detail, and ideally before the exploit cost profiles change.
As such, I still look forward to hearing from someone who has worked at an ISP/University/etc where this is actually practiced. What is in *those* logs?
Specifically: Can we get someone (hell, anyone really) from Utah to weigh in on this one? ;)
Otherwise, the rest is just paranoid speculation, and bordering on trolled-up misinformation. :/