On Thu, 13 Aug 2015, Mike Perry wrote:
As such, I still look forward to hearing from someone who has worked at an ISP/University/etc where this is actually practiced. What is in *those* logs?
I can speak to my experience in that capacity.
As an ISP, I need to be able to answer questions about the data flowing through my network, such as:
- What if scenarios for capacity planning, peering, etc that require me to understand how much traffic (in bits/sec at 95th percentile) is coming and going to each BGP prefix (route).
- Be able to shut down customers that are attacking the outside world and be able to classify the type of attack. To do this, I need to be able to differentiate good traffic from attack traffic somehow.
- Be able to quickly blackhole customer IPs that are taking a volumetric DDoS attack that is sufficiently large to threaten my other customer's connectivity.
- Be able to bill customers for the quantity of bandwidth that they use. For most ISPs, this comes from per-port counters, but ISPs that bill differently for peering traffic or for international traffic need more detail on bandwidth sent per customer to various destinations.
- Be able to track a given connection back to a customer to be able to react to abuse complaints for at least a few weeks. In my environment, I can just look at which IPs are assigned to a port, but other ISPs will need to consult DHCP logs. Some ISPs do NAT at the ISP level, and if a single IP can be used simultaneously by multiple customers, they may need to keep per-connection logs.
Netflow is the standardized way to collect packet header information to be able to answer all of the above and from what I can tell is in use by at least a significant fraction of ISPs. There are a few options to capture netflow data:
- Limited netflow capability is built into all of my big routers, and is probably the easiest to set up.
- If doing the netflow collection on the router itself is not viable, either port mirroring or specialized copper or fiber network taps can easily send a copy of all traffic to a collector box running something like nProbe (http://www.ntop.org/products/netflow/nprobe/).
The netflow record format will be something like on of these: http://www.cisco.com/c/en/us/td/docs/net_mgmt/netflow_collection_engine/3-6/...
The default netflow logging is per-flow, so one record will be saved per {protocol, source IP, source port, destination IP, destination port} tuple per 5-30 minutes. Connections that live longer than the flow timeout will span multiple records. Timestamps are second granularity, and TCP flags are ORed together, so you can tell whether the connection started/ended/continued based on the TCP flags.
I try to avoid storing any raw per-flow data to disk. At the scale I'm operating, I can't store it for very long, and walking through it again is too slow. If I wanted to throw more hardware at netflow log processing, it's at least possible to do, though. Of the people I've heard doing this, they are mostly paranoid companies (not ISPs) who want to be able to trace security incidents after the fact.
Instead, I try to only store much smaller aggregate data, such as packets and bytes sent and received per 5 minutes to and from each /24, and the results of the netflow-based attack detector, which processes the flows as it gets them.
-- Aaron