Re: [tor-relays] clarification on what Utah State University exit relays store ("360 gigs of log files")

13 Aug 2015


      On Wed, Aug 12, 2015 at 7:45 PM, Mike Perry mikeperry@torproject.org wrote:
...
At what resolution is this type of netflow data typically captured?
Routers originally exported at 100% coverage, then many of them
started supporting sampling at various rates (because routers were
choking and buggy anyways, and netheads were happy with averages),
some only do sampling. Plug flow probes into network taps and you
can do whatever you want (netsec loves this and other tools).
...
Are we talking about all connection 5-tuples, bidirectional/total
transfer byte totals, and open and close timestamps, or more (or less)
detail than this?
Are timestamps always included? Are bidirectional transfer bytecounts
always included? Are subsampled packet headers (or contents)
sometimes/often included?
What about UDP sessions? IPv6?
Information about how UDP is treated would also be useful if/when we
manage to switch to a UDP transport protocol, independent of any
padding.
All of the above depends on which flow export version / aggregation you
choose, until you get to v9 and IPFIX, for which you can define your fields.
In short... yes.
Flow endtime is last matching packet seen, but a flow can span records
when the time (therefore space, ie RAM) limited mandatory expiry timers hit.
UDP goes via that, TCP usually via flags. Records can span flows for
which other semantic keys may not exist, as often with UDP.
But DPI can also be used in the exporter to do all sorts of fun stuff and enable
other downstream uses (obviously TLS / IPSEC / crypto break some things there).
Tor already bundles multiple logical flows (only TCP for user today) into some
number of physical TCP flows, UDP transport there might not need
anything special.
But consider looking at average flow lifetimes on the internet. There may
be case for going longer, bundling or turfing across a range of ports to falsely
trigger a record / bloat, packet switching and so forth.
...
and having more information about what is typically
recorded in these cases would be very useful to inform how we might want
to design padding and connection usage against this and other issues.
"Typical" is really defined by the use case of whoever needs the flows,
be it provisioning, engineering, security, operations, billing, bigdata, etc.
And only limited by the available formats, storage, postprocessing,
and customization. IPFIX and
https://en.wikipedia.org/wiki/NetFlow
https://en.wikipedia.org/wiki/IP_Flow_Information_Export
https://www.google.com/search?q=(netflow%7CIPFIX)+(probe%7Cexporter%7Cparser)
http://www.freebsd.org/cgi/man.cgi?query=ng_netflow&sektion=4
...
I think for various reasons (including this one), we're soon going to
want some degree of padding traffic on the Tor network at some point
relatively soon
Really? I can haz cake nao? Or only after I pump in this 3k email and
watch 3k come out the other side to someone otherwise idling ;)
https://cdn.plixer.com/images/slider-3-icon.png
... and/or some other bigdata systems ...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-relays] clarification on what Utah State University exit relays store ("360 gigs of log files")