5.3 How NetFlow Works | Linux Kernel in a Nutshell (In a Nutshell (OReilly))

The following sections explore NetFlow in detail.

5.3.1 Flows

NetFlow is based on the idea of a flow of network traffic. What is a flow? It is what you would naturally think of as one full network conversation. The Transmission Control Protocol (TCP) connection a workstation opens to retrieve a Web page is a flow. The start of the TCP connection is the beginning of the flow, and when the connection is closed, the flow is complete. Another example of a flow would be the series of ICMP packets that make up a ping test. The flow begins with the first ICMP packet and ends with the last.

Every flow must have a unique set of the following:

Source IP address
Destination IP address
Source port
Destination port
IP protocol number
Type of service field
Input interface

These values must remain the same for the entire flow. Retrieving multiple Web pages over a single TCP connection, for example, is all one flow. But if multiple TCP connections are made to the same Web server, different port numbers will be used for each connection, so each connection will be a different flow. Similarly, an attacker who is scanning destination IP addresses on your network will generate many flows: one for each source/destination pair.

Also note that a flow is unidirectional. That is, one of the addresses is only the source of traffic and the other is only the destination. This means that a flow represents just the traffic moving in one direction. In communications with a Web server there are really two flows present: one is the traffic from the workstation to the server, and the other is the traffic from the server to the workstation. The router will always report flows for traffic that enters an interface, not traffic that leaves an interface. So if NetFlow is enabled on the interface where your workstation resides, your workstation will be the source address in any flow data. If NetFlow is also enabled on interfaces where another machine is sending data to your workstation, the router will then report your workstation as the destination address for those flows.

Of course, it is not always easy to determine what constitutes one session's worth of traffic and therefore when a particular flow ends. For a TCP connection, it is relatively easy; the session begins with a TCP SYN and ends with a TCP FIN. For UDP or Internet Control Message Protocol (ICMP), it is more difficult, and the router must use heuristics to decide when the flow is complete.

Because the router has only a limited amount of memory available for holding flow data, it must occasionally remove flows from the cache in order to make room for new ones. For this reason, it is important to realize that when a router reports on a flow, it may not really be the entire conversation as expected. For example, one flow may represent the first part of a TCP connection, another two flows might be the middle, and yet another flow, the end. Or it is just as possible for an entire TCP connection to be reported in a single flow. It depends on whether the router needs to free up resources. By default, a Cisco router will expire a flow when one of the following occurs:

The end of a TCP connection is found.
No traffic has been present in the flow for 15 seconds.
The flow has been running for over 30 minutes.
The table of flows in the router is filled.

5.3.2 NetFlow and Switching Paths

NetFlow is sometimes called NetFlow Accounting or NetFlow Switching. Both are acceptable terms, but one very prevalent myth should be dispelled about the latter. NetFlow is not a switching path. That is, it is not a system the router uses to figure out which packets are sent to which interface. NetFlow always relies on one of the other switching paths, such as Cisco Express Forwarding (CEF) or optimum switching, to make these decisions. NetFlow does interact with the switching path , however. Decisions on whether a packet matches an access-list filter are optimized so that the decision is made only once per flow, instead of for every packet in the flow. This practice also has an impact on access-list counters, which will no longer represent the number of packets matched when used in conjunction with NetFlow.

5.3.3 Exporting NetFlow Data

Routers can export NetFlow data so that it can be received by a server and then processed in any way the network administrator wishes. The host or software that receives the data is called a flow collector . The NetFlow data is exported in UDP packets to an IP address and UDP port number that are configured on the router. Because routers send the exported data using UDP, it is possible for packets of NetFlow data to be lost. As described below, version 5 of NetFlow introduces sequence numbers so that if data is lost, the flow collector will at least be aware of the loss. There is, however, no way to retrieve the lost data.

On a Cisco router, only one IP address can be specified as the recipient of exported flows, so if you need to send NetFlow data to more than one collector, it will have to be forwarded from one collector to the others. Note that one piece of data that is not included in a flow is the router from which the flow was sent. Typically, a collector knows which router the flow came from based on the source IP address of the exported packets. However, if a flow is forwarded from one collector to another, this data may be lost unless the forwarder either specially includes the information via some other mechanism or forges the source address of the forwarded packets to be that of the original router.

5.3.4 NetFlow Versions

There are several different version of NetFlow available. Version 1 was the first, and each flow contained the following information:

Source IP address
Destination IP address
Source port
Destination port
Next hop router address
Input interface number
Output interface number
Number of packets sent
Number of bytes sent
sysUpTime when flow began
sysUpTime when flow ended
IP protocol number
IP type of service
Logical OR of all TCP flags seen

There are currently four other versions of NetFlow:

Version 5. Includes Autonomous System (AS) numbers from Border Gateway Protocol (BGP) if available and also includes sequence numbers, which allows checking for lost packets.
Version 7. Used only on the Cisco Catalyst 5000 product series, which also requires a special NetFlow Feature Card in order to perform NetFlow accounting.
Version 8. Introduces Router-Based NetFlow Aggregation, a feature that can greatly reduce the amount of data sent when flows are exported from the router. By choosing an appropriate aggregation scheme, you can instruct the router to group flows with similar data into a single aggregate flow that is reported on behalf of all the others.
Version 9. Introduces a template-based scheme for reporting flows, which allows a client receiving exported flow data to process the data without necessarily having prior knowledge of what kinds of data will be present.