Checking the Network | Troubleshooting Linux Firewalls

Refer back to the troubleshooting methodology covered in Chapter 4 and the OSI model in Chapter 5. The first thing to check when diagnosing network problems is the arp cache to make sure that the problem with the firewall or host is not just related to a bad arp entry and a problem at Layer 2 of the OSI model. To view the arp cache, the command is very straightforward:

 arp -n Address             HWtype    HWaddress           Flags Mask     Iface 192.168.10.1        ether     00:00:12:34:56:78   C              eth1

Or in the Linux style:

 arp -e Address          HWtype  HWaddress                Flags Mask     Iface foo.bar.edu      ether       00:00:12:34:56:78    C              eth1

In this example, there is just one host, our firewall, as seen from a client. Let's imagine that the arp enTRy is reporting the wrong hardware address for our firewall. You can flush this entry from the arp table with the -d switch:

 arp -d 192.168.10.1

Then when viewing the arp cache on your host, you will see that it's empty:

 arp -n Address       HWtype    HWaddress           Flags Mask    Iface

Please check the arp man pages for information on other options for arp. The intent here is to show that you will want to work your way up from the OSI model, ruling out problems before moving on to more complex issues. When arp is ruled out, you can move on to IP and so on.

Next, we move on to ping. As the reader is no doubt aware, ping is one of the easiest to use tools for testing a connection and can help to determine if a host is up or if there is an outage between two hosts. Again, the intent is to rule out root problems at lower layers of the OSI model before moving on to higher levels. The standard ping included with Linux and other operating systems is largely limited in terms of what it can test for. With ping, you might be able, depending on your firewall rules, to help determine if a problem lies at a lower or higher level of the OSI model, indicating where you should start looking. The previous chapter included some examples of how to use ping to diagnose Layer 2 problems. Here we present some additional uses of ping to test more than basic connectivity between hosts. It can also be used to map out the route between two hosts by using the -R switch:

 ping -R www.gotroot.com PING gotroot.com (205.241.45.98) 56(124) bytes of data. 64 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=1 ttl=50 time=92.7 ms RR:     liberty.gmsociety.org (216.218.240.134)         pos2-0.gsr12012.fmt.he.net (64.62.249.121)         ix-4-0.core2.PaloAlto.Teleglobe.net (64.86.84.154)         if-8-0.core2.PaloAlto.Teleglobe.net (207.45.222.26)         sl-teleg-2-0.sprintlink.net (160.81.205.142)         sl-gw28-ana-0-0.sprintlink.net (144.232.1.30)         sl-bb21-ana-6-0.sprintlink.net (144.232.1.61)         sl-bb25-ana-8-0.sprintlink.net (144.232.9.64)         sl-bb24-fw-14-0.sprintlink.net (144.232.11.73)

The other switch we want to mention when using ping is the -s switch. This sets the size of the packet sent and can help with diagnosing MTU problems or other packet size problems with your firewall and the networks between your firewall and the target of your ping. If a normal ping, which is only 84-byte, gets through, and a large ping of 2000 bytes does not, the problem is most likely an MTU and fragmentation problem.

 ping -s 2000 www.gotroot.com PING gotroot.com (205.241.45.98) 2000(2028) bytes of data. 2008 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=1 ttl=50 time=105 ms 2008 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=2 ttl=50 time=153 ms 2008 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=3 ttl=50 time=116 ms 2008 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=4 ttl=50 time=183 ms 2008 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=5 ttl=50 time=106 ms 2008 bytes from plesk.shinn.net (205.241.45.98): icmp_seq=6 ttl=50 time=117 ms --- gotroot.com ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5008ms rtt min/avg/max/mdev = 105.469/130.371/183.593/28.654 ms

nmap is another tool that can ping hosts, but unlike ping, it's not limited to using ICMP pings. To send an ICMP ping with nmap, you would use the -sP switches with nmap in this manner, replacing 10.10.10.192 with the IP address or hostname of the host you wanted to ping:

 nmap -sP 10.10.10.192 Starting nmap 3.50 ( http://www.insecure.org/nmap/ ) at 2004-07-11 23:09 EDT Host printer.int.shinn.net (10.10.10.192) appears to be up. Nmap run completed -- 1 IP address (1 host up) scanned in 0.324 seconds

nmap also has the ability to "sweep" a network block, pinging all the hosts in that box, and then presenting information about which hosts responded to the ICMP pings. This can be useful to determine if a particular host is having a problem at a lower layer in the OSI model and ruling out a network-wide problem. Here is an example of using nmap in that manner:

 nmap -sP 10.10.10.0/24 Starting nmap 3.50 ( http://www.insecure.org/nmap/ ) at 2004-07-11 23:12 EDT Host a.foo.com (10.10.10.192) appears to be up. Host x.foo.com (10.10.10.253) appears to be up. Nmap run completed -- 256 IP addresses (2 hosts up) scanned in 294.960 seconds

nmap also can generate what is referred to as a tcp ping. This is basically accomplished by sending a SYN, ACK, SYN+ACK, or any other packet to a host and registering the response it gets back from the port that was "pinged." This is an excellent way of testing to see if a service is up, when you know the host is up. Keep in mind that this sort of a "ping" is occurring at Layer 4 of the OSI model and as such might not be an effective way of ruling out problems at lower layers.

If you can ping it with ICMP and you aren't losing any packets, including large packets, then you can rule out the problem as a Layer 2 problem. The problem lies somewhere in Layers 3 through 7.

 nmap -sT -p 80 www.gotroot.com Starting nmap 3.50 ( http://www.insecure.org/nmap/ ) at 2004-04-20 21:10 EDT Interesting ports on www.gotroot.com (205.241.45.98): PORT STATE SERVICE 80/tcp open http Nmap run completed -- 1 IP address (1 host up) scanned in 0.715 seconds

The next tool, traceroute, is a key piece of any network diagnosis toolkit. traceroute can record the routes of a packet as it moves through the network. This is similar to using ping with the -R switch, but traceroute uses a different method for recording routes. There are two versions of traceroutethe venerable classic traceroute, which uses UDP and ICMP, and tcptraceroute, which uses TCP to trace out the routes of clients. Both of these tools have their place, so neither one is better than the other.

Classic traceroute uses UDP packets with an initial TTL of 1 set to determine the route to a host. traceroute sends out its first packet to the destination with its TTL of 1, causing the first machine or device it reaches along the way to send back an ICMP unreachable message because the message has "lived too long." TTL, or Time To Live, is a field in the IP headers that tells a compliant device along the way how much "life" is left in the packet. Each device in a route is required to reduce the TTL by one before passing the packet on to the next device in the route. traceroute uses this to record the route by sending multiple packets with increasing TTLs, starting at 1, and increasing from there.

A device is required to send back an ICMP unreachable packet anytime the TTL reaches 0. So, when traceroute sends out its initial packet, it's already stacked the deck so that the first hop will have to reduce the TTL to 0 by subtracting 1 from the initial TTL of 1. It's a pretty neat trick, and it does this for each step along the way. Each time, traceroute records this information as the next hop in the route by looking for those ICMP unreachable packets.

So, in our example, traceroute sends the initial packet with a TTL of 1, the first hop subtracts 1 and sends back an ICMP unreachable. Then traceroute sends another packet with the TTL set to 2, causing the first hop to decrease the TTL in the header field of the packet to 1 and then to send it on to the second hop in the route. This second hop decrements the TTL to 0, drops the packet, and sends back a destination unreachable message. This process keeps happening, with traceroute patiently adding 1 to the TTL, until it generates the final packet to the destination host.

This is where you can, with iptables, play some neat tricks on traceroute by using the iptables TTL target (-j TTL ttl-inc <value>) to increase the TTL, without traceroute's permission, by any value you like. This causes the packet to appear to bypass the firewall, whereby the firewall becomes "invisible" to traceroute. For example, this rule increases the TTL for UDP traceroute packets by one:

 iptables -t mangle -A PREROUTING -p TCP \ --dport 33434:33542 -j TTL --ttl-inc 1

which causes this behavior to occur with traceroute when the packet is sent from behind the firewall to a host on the Internet:

[View full width]
 traceroute to liberty.gmsociety.org (216.218.240.134), 30 hops max, 38 byte packets  1  ip68-100-72-1.dc.dc.cox.net (68.100.72.1) 25.685 ms 12.024 ms 18.446 ms  2  ip68-100-0-1.dc.dc.cox.net (68.100.0.1) 17.811 ms 12.891 ms 15.039 ms  3  ip68-100-0-137.dc.dc.cox.net (68.100.0.137) 15.712 ms 18.692 ms 21.172 ms 5 68.1.1.4  (68.1.1.4) 16.610 ms 42.524 ms 17.100 ms  4  68.1.1.3 (68.1.1.3) 16.699 ms 13.659 ms 15.814 ms  5  ashbbbpc01pos0100.r2.as.cox.net (68.1.1.19) 16.710 ms 12.788 ms *  6  ash-ix.he.net (206.223.115.37) 23.008 ms 18.023 ms 23.396 ms  7  pos7-0.gsr12012.pao.he.net (216.218.254.205) 117.154 ms 104.813 ms 123.567 ms 8  pos2-0.gsr12012.fmt.he.net (64.62.249.121) 102.942 ms 106.383 ms 102.014 ms 11 * * * [...] 21 * * * [...]

And this is what the traceroute would look like without that rule:

[View full width]
 traceroute to liberty.gmsociety.org (216.218.240.134), 30 hops max, 38 byte packets 1  192.168.10.1 2 ip68-100-72-1.dc.dc.cox.net (68.100.72.1) 25.685 ms 12.024 ms 18.446 ms  3 ip68-100-0-1.dc.dc.cox.net (68.100.0.1) 17.811 ms 12.891 ms 15.039 ms  4 ip68-100-0-137.dc.dc.cox.net (68.100.0.137) 15.712 ms 18.692 ms 21.172 ms 5 68.1.1.4  (68.1.1.4) 16.610 ms 42.524 ms 17.100 ms  5 68.1.1.3 (68.1.1.3) 16.699 ms 13.659 ms 15.814 ms  6 ashbbbpc01pos0100.r2.as.cox.net (68.1.1.19) 16.710 ms 12.788 ms *  7 ash-ix.he.net (206.223.115.37) 23.008 ms 18.023 ms 23.396 ms  8 pos7-0.gsr12012.pao.he.net (216.218.254.205) 117.154 ms 104.813 ms 123.567 ms 9 pos2-0.gsr12012.fmt.he.net (64.62.249.121) 102.942 ms 106.383 ms 102.014 ms 12 * * * [...] 21 * * * [...]

The astute reader will notice that in both cases the traceroute did not succeed because the ISP at the end of that route is filtering out normal traceroute traffic. We illustrate this to remind you that this is one of the downsides to using traceroute, many sites filter it. Nevertheless, you will find that vanilla traceroute is still an extremely valuable tool when diagnosing the path a packet is taking through a network, although it might not always work due to filtering.

In those circumstances where vanilla tracetroute does not work, in steps tcptraceroute. As you may have already guessed, tcptraceroute uses TCP to trace the route to a host. tcptraceroute can take advantage of the fact that it is much more difficult to filter legitimate connections to hosts on open ports, such as the tcp port 80 for a web server. And, much like vanilla traceroute, tcptracroute utilizes that same incrementing TTL trick to get back unreachable messages from intermediate hosts along the way. Here is the same example using tcptraceroute:

[View full width]
 tcptraceroute liberty.gmsociety.org 22 Selected device ipsec0, address 192.168.10.12 for outgoing packets Tracing the path to liberty.gmsociety.org (216.218.240.134) on TCP port 22, 30 hops max  1 192.168.10.1  2 ip68-100-72-1.dc.dc.cox.net (68.100.72.1) 67.165 ms * 17.423 ms  3 ip68-100-0-1.dc.dc.cox.net (68.100.0.1) 13.608 ms 12.317 ms 14.783 ms  4 ip68-100-0-137.dc.dc.cox.net (68.100.0.137) 18.330 ms 11.879 ms 11.650 ms 5 68.1.1.4  (68.1.1.4) 13.771 ms 16.825 ms 12.600 ms  6 68.1.1.3 (68.1.1.3) 12.492 ms 14.013 ms 14.942 ms  7 ashbbbpc01pos0100.r2.as.cox.net (68.1.1.19) 14.581 ms 31.069 ms 13.775 ms 8 ash-ix.he .net (206.223.115.37) 13.028 ms 12.797 ms 12.083 ms  9 pos7-0.gsr12012.pao.he.net (216.218.254.205) 101.619 ms 85.920 ms 106.626 ms 10 pos2-0.gsr12012.fmt.he.net (64.62.249.121) 104.472 ms 108.409 ms 109.053 ms 11 liberty.gmsociety.org (216.218.240.134) [open] 122.805 ms 113.582 ms 113.513 ms

This time, you will notice that tcptraceroute was able to defeat both our iptables -j TTL rule and it was able to get around the upstream ISP's traceroute filtering. The key with tcptraceroute is to find a port that will get through all the hops along the way, so keep in mind that if your traceroute is not working, it's possible that your packet is just getting filtered, so try another port. If you're having trouble finding a port to use, start with the most obvious services you might expect to be running on that system. If you are still having trouble finding an open port, try using a tool like nmap to scan the system for open ports.

The bottom line with using tools like traceroute and tcptraceroute is to determine if you have a route to the remote host in question. If the route is up and correct, then the problem lies farther up the OSI model.