Introduction

Section 1: Understanding The Data Format

Here is a sample of the comma-separated-values (CSV) form of the data:

128.32.155.79:2048,217.136.200.35#1,20,34151,2868684,0,0,34151,2868684
128.32.155.79:2048,200.203.191.38#1,144,2872,241248,0,0,2872,241248
128.32.155.79:2048,62.233.232.204#1,144,2866,240744,0,0,2866,240744
128.32.155.79:2048,*:*,4186,8244,844409,0,0,8244,844409
128.32.155.79#1,217.136.200.35#1,20,0,0,33967,2853228,33967,2853228
128.32.155.79#1,62.233.232.204#1,144,0,0,2854,239736,2854,239736
128.32.155.79#1,200.203.191.38#1,144,0,0,2849,239316,2849,239316
128.32.155.79#1,*:*,4295,0,0,8093,832388,8093,832388
128.32.155.79:*,213.165.64.100:*,100,332,17822,278,20420,610,38242
128.32.155.79:*,217.72.192.149:25,28,200,13216,164,12983,364,26199
128.32.155.79:*,213.144.130.172:*,335,335,24455,0,0,335,24455
128.32.155.79:*,*:*,3670,10936,803464,5127,381308,16063,1184772

The fields are as follows:

Host IP and port[1] Host IP and port[1] Total Flows[2] Incoming Packets Incoming Bytes Outgoing Packets Outgoing Bytes Total Packets Total Bytes
128.32.155.79:* 213.144.130.172:* 518 471877 484982747 95359 41749837 567236 526732584
  1. The port field may have special meanings. See below.
  2. A flow is a group of packets with the same source and destination as seen by the router. Roughly speaking, a flow if a burst of packets from point A to point B. Flows are separated at 15 minutes for accounting purposes.

And here is a sample of the "human-readable" format:

Host                  Peer                    F  I-P   I-O  O-P   O-O  T-P   T-O
128.32.155.79:20      128.3.183.99:*        68K  66M 51.2G  23M  925M  89M 52.1G
128.32.155.79:20      128.218.214.102:4655    2   22   888   39 45.5K   61 46.4K
128.32.155.79:20      64.54.98.71:*          24   50  2096   86 43.8K  136 45.9K
128.32.155.79:20      *:*                    28  129 65.4K  115 27.6K  244   93K
128.32.155.79:80      132.239.1.232:*        76  25K 1338K  91K  103M 116K  105M
128.32.155.79:80      193.140.202.64:*      .4K  15K  651K  23K   29M  37K 29.6M
128.32.155.79:80      134.157.7.9:*          63  13K  541K  24K 28.4M  37K 28.9M
128.32.155.79:80      *:*                   66K 569K 44.8M 944K 1045M 1.5M 1089M
128.32.155.79:*       128.3.183.99:*        74K 332K 18.1M 339K 23.5M 671K 41.5M
128.32.155.79:*       128.135.229.50:*       29  12K  477K  22K 26.1M  34K 26.6M
128.32.155.79:*       64.156.215.5:25        2K  19K 1045K  25K 17.4M  43K 18.4M
128.32.155.79:*       *:*                   90K 1.1M 83.6M 1.2M  610M 2.3M  694M
128.32.155.79:25      128.3.41.61:*         .2K  22K 25.1M 8.8K  418K  31K 25.5M
128.32.155.79:25      128.218.114.108:*      16  12K 14.4M   2K 83.1K  14K 14.5M
128.32.155.79:25      131.243.248.26:*      .1K  11K 12.1M 2.2K  130K  13K 12.3M
128.32.155.79:25      *:*                   37K 500K  358M 339K 28.7M 839K  386M
128.32.155.79         TOTAL:                .3M  69M 51.8G  26M 2837M  95M 54.5G
The human-readable format's columns mirror those of the CSV version. For readability, they are reported in scaled units as described here.

Usually, the "Host IP and port" field lists an IP address, followed by a ":", followed by a port number. This indicates one endpoint of the network conversation. There are also several special meanings that can be assigned to the port number section. An entry of ":*" indicates "all ports not otherwise listed". For example, consider the following:

128.32.155.79:80      132.239.1.232:*        76  25K 1338K  91K  103M 116K  105M

In this case, there were several connections from 132.239.1.232 from varying ports to port 80 (used by web servers) of 128.32.155.79. The various flows were aggregated to form this single entry. This aggregation is motivated by the typical patterns of internet servers and clients, wherein a server accepts connections on a Well Known Port (such as port 80), with the client using varying ports on its end.

Another special meaning is denoted by the "#" symbol, as below:

128.32.18.151#11     217.136.200.176#11     10  17K 1466K    0     0  17K 1466K

Some entries do not have a port numbers, because some internet traffic is neither TCP nor UDP (the IP protocols that use port numbers.) Such traffic is instead reclassified by its IP protocol number. In the above example, 217.136.200.176 sent a number of packets corresponding to protocol number 11 (very unusual...abusive network behavior perhaps?). The most common non-tcp/udp protocol seen in these reports is the ICMP protocol (#1). ICMP packets hold special information on the type and code of the packet, which are shown in hexadecimal. The first byte is the ICMP type, the second is the ICMP code.

Read more about ICMP here.

The example below shows several "ICMP ECHO" communications:

128.32.18.151x800     217.136.200.176#1      10  17K 1466K    0     0  17K 1466K

Netflow recognizes the ICMP traffic specifically, and uses the port number of the destination to record additional information. So, x800 should actually be interpreted as the ICMP type and code bytes in hexadecimal notation. Referencing the above web site, we see that type 8, code 0 corresponds to ECHO.

The human readable section also includes "*:*" and "TOTAL:". "TOTAL:" is as it seems. "*:*" means "other" for the given category on the left (a category being defined by a host ip and port). So, referencing the above example:

128.32.155.79:20      128.3.183.99:*        68K  66M 51.2G  23M  925M  89M 52.1G
128.32.155.79:20      128.218.214.102:4655    2   22   888   39 45.5K   61 46.4K
128.32.155.79:20      64.54.98.71:*          24   50  2096   86 43.8K  136 45.9K
128.32.155.79:20      *:*                    28  129 65.4K  115 27.6K  244   93K

The traffic-category of the last line should be read "traffic between port 20 on 128.32.155.79 and all other endpoints not listed above (on local port 20).

Th above data corresponds to socrates.berkeley.edu, an interactive Unix machine serving many campus users. Ports 20, 80, and 25 correspond to file-transfer, www, and email traffic, respectively. The usage shown above is quite consistent with the role this machine is supposed to play.

Feel free to email Mike Hunter if you have any questions or comments regarding this document.


Last revised: May 21, 2003