7.7. Systemwide StatisticsThe following tools allow us to observe network statistics, including statistics for TCP, IP, and each network interface, throughout the system. 7.7.1. netstat CommandThe Solaris netstat command is the catch-all for a number of different network status programs. $ netstat -i Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 localhost localhost 191 0 191 0 0 0 ipge0 1500 waterbuffalo waterbuffalo 31152163 0 24721687 0 0 0 $ netstat -i 3 input ipge0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 31152218 0 24721731 0 0 31152409 0 24721922 0 0 $ netstat -I ipge0 -i 3 input ipge0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 31152284 0 24721797 0 0 31152475 0 24721988 0 0 netstat -i, mentioned earlier, prints only packet counts. We don't know if they are big packets or small packets, and we cannot use them to accurately determine how utilized the network interface is. Other performance monitoring tools plot this as a "be all and end all" valuethis is wrong. Packet counts may help as an indicator of activity. A packet count of less than 100 per second can be treated as fairly idle; a worst case for Ethernet makes this around 150 Kbytes/sec (based on maximum MTU size). The netstat -i output may be much more valuable for its error counts, as discussed in Section 7.5. netstat -s dumps various network-related counters from kstat. This shows that Kstat does track at least some details in terms of bytes. $ netstat -s | grep Bytes tcpOutDataSegs =37367847 tcpOutDataBytes =166744792 tcpRetransSegs =153437 tcpRetransBytes =72298114 tcpInAckSegs =25548715 tcpInAckBytes =148658291 tcpInInorderSegs =35290928 tcpInInorderBytes =3637819567 tcpInUnorderSegs =324309 tcpInUnorderBytes =406912945 tcpInDupSegs =152795 tcpInDupBytes =73998299 tcpInPartDupSegs = 7896 tcpInPartDupBytes =5821485 tcpInPastWinSegs = 38 tcpInPastWinBytes =971347352 However, the byte values above are for TCP in total, including loopback traffic that didn't travel through the network interfaces. These statistics can still be of some value, especially if large numbers of errors are observed. For more details on these and a reference table, see Section 7.9. netstat -k on Solaris 9 and earlier dumped all kstat counters. From the output we can see that there are byte counters (rbytes64, obytes64) for the hme0 interface, which is just what we need to measure per-interface traffic. However netstat -k was an undocumented switch that has now been dropped in Solaris 10. This is fine since there are better ways to get to kstat, including the C library, which is used by tools such as vmstat. $ netstat -k | awk '/^hme0/,/^$/' hme0: ipackets 70847004 ierrors 6 opackets 73438793 oerrors 0 collisions 0 defer 0 framing 0 crc 0 sqe 0 code_violations 0 len_errors 0 ifspeed 100000000 buff 0 oflo 0 uflo 0 missed 6 tx_late_collisions 0 retry_error 0 first_collisions 0 nocarrier 0 nocanput 0 allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0 rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0 slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0 rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0 rx_late_collisions 0 rbytes 289601566 obytes 358304357 multircv 558 multixmt 73411 brdcstrcv 3813836 brdcstxmt 1173700 norcvbuf 0 noxmtbuf 0 newfree 0 ipackets64 70847004 opackets64 73438793 rbytes64 47534241822 obytes64 51897911909 align_errors 0 fcs_errors 0 sqe_errors 0 defer_xmts 0 ex_collisions 0 macxmt_errors 0 carrier_errors 0 toolong_errors 0 macrcv_errors 0 link_duplex 0 inits 31 rxinits 0 txinits 0 dmarh_inits 0 dmaxh_inits 0 link_down_cnt 0 phy_failures 0 xcvr_vendor 524311 asic_rev 193 link_up 1 7.7.2. kstat CommandThe Solaris Kernel Statistics framework tracks network usage, and as of Solaris 8, the kstat command fetches these details (see Chapter 11). This command has a variety of options for selecting statistics and can be executed by non-root users. The -m option for kstat matches on a module name. In the following example, we use it to display all available statistics for the networking modules. $ kstat -m tcp module: tcp instance: 0 name: tcp class: mib2 activeOpens 803 attemptFails 312 connTableSize 56 ... $ kstat -m ip module: ip instance: 0 name: icmp class: mib2 crtime 3.207830752 inAddrMaskReps 0 inAddrMasks 0 ... $ kstat -m hme module: hme instance: 0 name: hme0 class: net name: hme0 class: net align_errors 0 allocbfail 0 ... These commands fetch statistics for ip, tcp, and hme (our Ethernet card). The first group of statistics (others were truncated) from the tcp and ip modules states their class as mib2: These statistic groups are maintained by the TCP and IP code for MIB-II and then copied into kstat during a kstat update. The following kstat command fetches byte statistics for our network interface, printing output every second. $ kstat -p 'hme:0:hme0:*bytes64' 1 hme:0:hme0:obytes64 51899673435 hme:0:hme0:rbytes64 47536009231 hme:0:hme0:obytes64 51899673847 hme:0:hme0:rbytes64 47536009709 ... Using kstat in this manner is currently the best way to fetch network interface statistics with tools currently shipped with Solaris. Other tools exist that take the final step and print this data in a more meaningful way: Kbytes/sec or percent utilization. Two such tools are nx.se and nicstat. 7.7.3. nx.se ToolThe SE Toolkit provides a language, SymbEL, that lets us write our own performance monitoring tools. It also contained a collection of example tools, including nx.se which helps us calculate network utilization. $ se nx.se 1 Current tcp RtoMin is 400, interval 1, start Sun Oct 9 10:36:42 2005 10:36:43 Iseg/s Oseg/s InKB/s OuKB/s Rst/s Atf/s Ret% Icn/s Ocn/s tcp 841.6 4.0 74.98 0.27 0.00 0.00 0.0 0.00 0.00 Name Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s hme0 845.5 420.8 119.91 22.56 0.000 0.000 0.0 0.00 0.00 10:36:44 Iseg/s Oseg/s InKB/s OuKB/s Rst/s Atf/s Ret% Icn/s Ocn/s tcp 584.2 5.0 77.97 0.60 0.00 0.00 0.0 0.00 0.00 Name Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s hme0 579.2 297.1 107.95 16.16 0.000 0.000 0.0 0.00 0.00 Having KB/s lets us determine how busy our network interfaces are. Other useful fields include collision percent (Coll%), no-can-puts per second (NoCP/s), and defers per second (Defr/s), which may be evidence of network saturation. nx.se also prints useful TCP statistics above the interface lines. 7.7.4. nicstat Toolnicstat, a tool from the freeware K9Toolkit, reports network utilization and saturation by interface. It is available as a C or Perl kstat consumer. $ nicstat 1 Time Int rKb/s wKb/s rPk/s wPk/s rAvs wAvs %Util Sat 10:48:30 hme0 4.02 4.39 6.14 6.36 670.73 706.50 0.07 0.00 10:48:31 hme0 0.29 0.50 3.00 4.00 98.00 127.00 0.01 0.00 10:48:32 hme0 1.35 4.23 14.00 15.00 98.79 289.00 0.05 0.00 10:48:33 hme0 67.73 19.08 426.00 207.00 162.81 94.39 0.71 0.00 10:48:34 hme0 315.22 128.91 1249.00 723.00 258.44 182.58 3.64 0.00 10:48:35 hme0 529.96 67.53 2045.00 1046.00 265.37 66.11 4.89 0.00 10:48:36 hme0 454.14 62.16 2294.00 1163.00 202.72 54.73 4.23 0.00 10:48:37 hme0 93.55 15.78 583.00 295.00 164.31 54.77 0.90 0.00 10:48:38 hme0 74.84 32.41 516.00 298.00 148.52 111.38 0.88 0.00 10:48:39 hme0 0.76 4.17 7.00 9.00 111.43 474.00 0.04 0.00 See K9Toolkit; nicstat.c or nicstat.pl In this example output of nicstat, we can see a small amount of network traffic, peaking at 4.89% utilization. The following are the switches available from version 0.98 of the Perl version of nicestat. $ nicstat -h USAGE: nicstat [-hsz] [-i int[,int...]] | [interval [count]] eg, nicstat # print a 1 second sample nicstat 1 # print continually every 1 second nicstat 1 5 # print 5 times, every 1 second nicstat -s # summary output nicstat -i hme0 # print hme0 only The utilization measurement is based on the current throughput divided by the maximum speed of the interface (if available through kstat). The saturation measurement is a value that reflects errors due to saturation if kstat found any. This method for calculating utilization does not account for other per-packet costs, such as Ethernet preamble. These costs are generally minor, and we assume they do not greatly affect the utilization value. 7.7.5. SNMPIt's worth mentioning that useful data is also available in SNMP, which is used by software such as MRTG (a popular freeware network utilization plotter). A full install of Solaris 10 provides Net-SNMP, putting many of the commands under /usr/sfw/bin. Here we demonstrate the use of snmpget to fetch interface statistics. $ snmpget -v1 -c public localhost ifOutOctets.2 ifInOctets.2 IF-MIB::ifOutOctets.2 = Counter32: 10016768 IF-MIB::ifInOctets.2 = Counter32: 11932165 The .2 corresponds to our primary interface. These values are the outbound and inbound bytes. In Solaris 10 a full description of the IF-MIB statistics can be found in /etc/sma/snmp/mibs/IF-MIB.txt. Other software products fetch and present data from the IF-MIB, which is a valid and desirable approach for monitoring network interface activity. Solaris 10's Net-SNMP supports SNMPv3, which provides User-based Security Module (USM) for the creation of user accounts and encrypted sessions; and View-based Access Control Module (VACM) to restrict users to view only the statistics they need. When configured, they greatly enhance the security of SNMP. For information on each, see snmpusm(1M) and snmpvacm(1M). Net-SNMP also provides a version of netstat called snmpnetstat. Besides the standard output using -i, snmpnetstat has a -o option to print octets (bytes) instead of packets. $ snmpnetstat -v1 -c public -i localhost Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Queue lo0 8232 loopback localhost 6639 0 6639 0 0 hme0 1500 192.168.1 titan 385635 0 86686 0 0 hme0:1 1500 192.168.1 192.168.1.204 0 0 0 0 0 $ $ snmpnetstat -v1 -c public -o localhost Name Network Address Ioctets Ooctets lo0 loopback localhost 0 0 hme0 192.168.1 titan 98241462 55500788 hme0:1 192.168.1 192.168.1.204 0 0 Input bytes (Ioctets) and output bytes (Ooctets) can be seen. Now all we need is an interval for this information to be of real value.
Even though we provided the -o option, by also providing an interval (10 seconds), we caused the snmpnetstat command to revert to printing packet counts. Also, the statistics that SNMP uses are only updated every 30 seconds. Future versions of snmpnetstat may correctly print octets with intervals. 7.7.6. checkcable ToolSometimes network performance problems can be caused by incorrect auto-negotiation that selects a lower speed or duplex. There is a way to retrieve the settings that a particular network card has chosen, but there is not one way that works for all cards. It usually involves poking around with the ndd command and using a lookup table for your particular card to decipher the output of ndd. Consistent data for network cards should be available from Kstat, and Sun does have a standard in place. However many of the network drivers were written before the standard existed, and some were written by third-party companies. The state of consistent Kstat data for network cards is improving and at some point in the future should boil down to a few well understood one-liners of the kstat command, such as:kstat -p | grep <interfacename>. In the meantime, it is not always that easy. Some data is available from kstat, much of it from ndd. The following example demonstrates fetching ndd data for an hme card. # ndd /dev/hme link_status 1 # ndd /dev/hme link_speed 1 # ndd /dev/hme link_mode 1 These numbers indicate a connected or unconnected cable (link_status), the current speed (link_speed), and the duplex (link_mode). What 1 or some other number means depends on the card. A list of available ndd variables for this card can be listed with ndd -get /dev/hme \? (the -get is optional). SunSolve has Infodocs to explain what these numbers mean for various cards. If you have mainly one type of card at your site, you eventually remember what the numbers mean. As a very general rule, "1" is often good, "0" is often bad; so "0" for link_mode probably means half duplex. The checkcable tool, available from the K9Toolkit, deciphers many card types for you.[3] It uses both kstat and ndd to retrieve the network settings because not all the data is available to either kstat or ndd.
# checkcable Interface Link Duplex Speed AutoNEG hme0 UP FULL 100 ON # checkcable Interface Link Duplex Speed AutoNEG hme0 DOWN FULL 100 ON The first output has the hme0 interface as link-connected (UP), full duplex, 100 Mbits/sec, and auto-negotiation on; the second output was with the cable disconnected. The speed and duplex must be set to what the switch thinks they are set to so that the network link functions correctly. There are still some cards that checkcable is unable to view. The state of card statistics is slowly getting better; eventually, checkcable will not be needed to translate these numbers. 7.7.7. ping Toolping is the classic network probe tool; it uses ICMP messages to test the response time of round-trip packets. $ ping -s mars PING mars: 56 data bytes 64 bytes from mars (192.168.1.1): icmp_seq=0. time=0.623 ms 64 bytes from mars (192.168.1.1): icmp_seq=1. time=0.415 ms 64 bytes from mars (192.168.1.1): icmp_seq=2. time=0.464 ms ^C ----mars PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.415/0.501/0.623/0.11 So we discover that mars is up and that it responds within 1 millisecond. Solaris 10 enhanced ping to print three decimal places for the times. ping is handy to see if a host is up, but that's about all. 7.7.8. traceroute Tooltraceroute sends a series of UDP packets with an increasing TTL, and by watching the ICMP time-expired replies, we can discover the hops to a host (assuming the hops actually decrement the TTL): $ traceroute www.sun.com traceroute: Warning: Multiple interfaces found; using 260.241.10.2 @ hme0:1 traceroute to www.sun.com (209.249.116.195), 30 hops max, 40 byte packets 1 tpggate (260.241.10.1) 21.224 ms 25.933 ms 25.281 ms 2 172.31.217.14 (172.31.217.14) 49.565 ms 27.736 ms 25.297 ms 3 syd-nxg-ero-zeu-2-gi-3-0.tpgi.com.au (220.244.229.9) 25.454 ms 22.066 ms 26.237 ms 4 syd-nxg-ibo-l3-ge-0-2.tpgi.com.au (220.244.229.132) 42.216 ms * 37.675 ms 5 220-245-178-199.tpgi.com.au (220.245.178.199) 40.727 ms 38.291 ms 41.468 ms 6 syd-nxg-ibo-ero-ge-1-0.tpgi.com.au (220.245.178.193) 37.437 ms 38.223 ms 38.373 ms 7 Gi11-2.gw2.syd1.asianetcom.net (202.147.41.193) 24.953 ms 25.191 ms 26.242 ms 8 po2-1.gw1.nrt4.asianetcom.net (202.147.55.110) 155.811 ms 169.330 ms 153.217 ms 9 Abovenet.POS2-2.gw1.nrt4.asianetcom.net (203.192.129.42) 150.477 ms 157.173 ms * 10 so-6-0-0.mpr3.sjc2.us.above.net (64.125.27.54) 240.077 ms 239.733 ms 244.015 ms 11 so-0-0-0.mpr4.sjc2.us.above.net (64.125.30.2) 224.560 ms 228.681 ms 221.149 ms 12 64.125.27.102 (64.125.27.102) 241.229 ms 235.481 ms 238.868 ms 13 * *^C The times may provide some idea of where a network bottleneck is. We must also remember that networks are dynamic and that this may not be the permanent path to that host (and could even change as traceroute executes). 7.7.9. snoop ToolThe power to capture and inspect network packets live from the interface is provided by snoop, an indispensable tool. When network events don't seem to be working, it can be of great value to verify that the packets are actually arriving in the first place. snoop places a network device in "promiscuous mode" so that all network traffic, addressed to this host or not, is captured. You ought to have permission to be sniffing network traffic, as often snoop displays traffic contentsincluding user names and passwords. # snoop Using device /dev/hme (promiscuous mode) jupiter -> titan TCP D=22 S=36570 Ack=1602213819 Seq=1929072366 Len=0 Win=49640 titan -> jupiter TCP D=36570 S=22 Push Ack=1929072366 Seq=1602213819 Len=128 Win=49640 jupiter -> titan TCP D=22 S=36570 Ack=1602213947 Seq=1929072366 Len=0 Win=49640 ... The most useful options include the following: don't resolve hostnames (-r), change the device (-d), output to a capture file (-o), input from a capture file (-i), print semi-verbose (-V, one line per protocol layer), print full-verbose (-v, all details), and send packets to /dev/audio (-a). Packet filter syntax can also be applied. By using output files, you can try different options when reading them (-v, -V). Moreover, outputting to a file incurs less CPU overhead than the default live output. 7.7.10. TTCPTest TCP is a freeware tool that tests the throughput between two hops. It needs to be run on both the source and destination, and a Java version of TTCP runs on many different operating systems. Beware, it floods the network with traffic to perform its test. The following is run on one host as a receiver. The options used here made the test run for a reasonable durationaround 60 seconds. $ java ttcp -r -n 65536 Receive: buflen= 8192 nbuf= 65536 port= 5001 Then the following was run on the second host as the transmitter, $ java ttcp -t jupiter -n 65536 Transmit: buflen= 8192 nbuf= 65536 port= 5001 Transmit connection: Socket[addr=jupiter/192.168.1.5,port=5001,localport=46684]. Transmit: 536870912 bytes in 46010 milli-seconds = 11668.57 KB/sec (93348.56 Kbps). This example shows that the speed between these hosts for this test is around 11.6 megabytes per second. It is not uncommon for people to test the speed of their network by transferring a large file around. This may be better than it sounds; any test is better than none. 7.7.11. pathchar ToolAfter writing TRaceroute, Van Jacobson wrote pathchar, an amazing tool that identifies network bottlenecks. It operates like TRaceroute, but rather than printing response time to each hop, it prints bandwidth between each pair of hops. # pathchar 192.168.1.1 pathchar to 192.168.1.1 (192.168.1.1) doing 32 probes at each of 64 to 1500 by 32 0 localhost | 30 Mb/s, 79 us (562 us) 1 neptune.drinks.com (192.168.2.1) | 44 Mb/s, 195 us (1.23 ms) 2 mars.drinks.com (192.168.1.1) 2 hops, rtt 547 us (1.23 ms), bottleneck 30 Mb/s, pipe 7555 bytes This tool works by sending "shaped" traffic over a long interval and carefully measuring the response times. It doesn't flood the network like TTCP does. Binaries for pathchar can be found on the Internet, but the source code has yet to be released. Some open source versions, based on the ideas from pathchar, are in development. 7.7.12. ntop Toolntop sniffs network traffic and issues comprehensive reports through a web interface. It is very useful, so long as you can (and are allowed to) snoop the traffic of interest. It is driven from a web browser aimed at localhost:3000. # ntop ntop v.1.3.1 MT [sparc-sun-solaris2.8] listening on [hme0,hme0:0,hme0:1]. Copyright 1998-2000 by Luca Deri <deri@ntop.org> Get the freshest ntop from http://www.ntop.org/ Initialising... Loading plugins (if any)... WARNING: Unable to find the plugins/ directory. Waiting for HTTP connections on port 3000... Sniffying... 7.7.13. NFS Client Statistics: nfsstat -c$ nfsstat -c Client rpc: Connection oriented: calls badcalls badxids timeouts newcreds badverfs timers 202499 0 0 0 0 0 0 cantconn nomem interrupts 0 0 0 Connectionless: calls badcalls retrans badxids timeouts newcreds badverfs 0 0 0 0 0 0 0 timers nomem cantsend 0 0 0 Client nfs: calls badcalls clgets cltoomany 200657 0 200657 7 Version 2: (0 calls) null getattr setattr root lookup readlink read wrcache 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% write create remove rename link symlink mkdir rmdir 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% readdir statfs 0 0% 0 0% Version 3: (0 calls) null getattr setattr lookup access readlink 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% read write create mkdir symlink mknod 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% fsstat fsinfo pathconf commit 0 0% 0 0% 0 0% 0 0% Client statistics printed include retransmissions (retrans), unmatched replies (badxids), and timeouts. See nfsstat(1M) for verbose descriptions. 7.7.14. NFS Server Statistics: nfsstat -sThe server version of nfsstat prints a screenful of statistics to pick through. Of interest are the value of badcalls and the number of file operation statistics. $ nfsstat -s Server rpc: Connection oriented: calls badcalls nullrecv badlen xdrcall dupchecks dupreqs 5897288 0 0 0 0 372803 0 Connectionless: calls badcalls nullrecv badlen xdrcall dupchecks dupreqs 87324 0 0 0 0 0 0 ... Version 4: (949163 calls) null compound 3175 0% 945988 99% Version 4: (3284515 operations) reserved access close commit 0 0% 72954 2% 199208 6% 2948 0% create delegpurge delegreturn getattr 4 0% 0 0% 16451 0% 734376 22% getfh link lock lockt 345041 10% 6 0% 101 0% 0 0% locku lookup lookupp nverify 101 0% 145651 4% 5715 0% 171515 5% open openattr open_confirm open_downgrade 199410 6% 0 0% 271 0% 0 0% putfh putpubfh putrootfh read 914825 27% 0 0% 581 0% 130451 3% readdir readlink remove rename 5661 0% 11905 0% 15 0% 201 0% renew restorefh savefh secinfo 30765 0% 140543 4% 146336 4% 277 0% setattr setclientid setclientid_confirm verify 23 0% 26 0% 26 0% 10 0% write release_lockowner illegal 9118 0% 0 0% 0 0% ... |