23.3 Keepalive Examples

23.3 Keepalive Examples

We'll now go through scenarios 2, 3, and 4 from the previous section, to see the packets exchanged using the keepalive option.

Other End Crashes

Let's see what happens when the server host crashes and does not reboot. To simulate this we'll do the following steps:

  • Establish a connection between a client (our sock program on the host bsdi ) and the standard echo server on the host svr4. The client enables the keepalive option with the -K option.

  • Verify that data can go across the connection.

  • Watch the client's TCP send keepalive packets every 2 hours, and see them acknowledged by the server's TCP.

  • Disconnect the Ethernet cable from the server, and leave it off until the example is complete. This makes the client think the server host has crashed.

  • We expect the client to send 10 keepalive probes, 75 seconds apart, before declaring the connection dead.

Here is the interactive output on the client:

 bsdi %  sock -K svr4 echo   -K for keepalive option   hello, world   type this at beginning, to verify connection is up  hello, world  and see this echoed   disconnect Ethernet cable after 4 hours  read error: Connection timed out  this happens about 6 hours and 11 minutes after start  

Figure 23.1 shows the tcpdump output. (We have removed the connection establishment and the window advertisements.)

Figure 23.1. Keepalive packets that determine that a host has crashed.
graphics/23fig01.gif

Lines 1, 2, and 3 send the line "hello, world" from the client to the server and back. The first keepalive probe occurs 2 hours (7200 seconds) later on line 4. The first thing we see is an ARP request and an ARP reply, before the TCP segment on line 6 can be sent. The keepalive probe on line 6 elicits a response from the other end (line 7). The same sequence of packets is exchanged 2 hours later in lines 811.

If we could see all the fields in the keepalive probes, lines 6 and 10, we would see that the sequence number field is one less than the next sequence number to be sent (i.e., 13 in this example, when it should be 14), but because there is no data in the segment, tcpdump does not print the sequence number field. (It only prints the sequence number for empty segments that contain the SYN, FIN, or RST flags.) It is the receipt of this incorrect sequence number that forces the server's TCP to respond with an ACK to the keepalive probe. The response tells the client the next sequence number that the server is expecting (14).

Some older implementations based on 4.2BSD do not respond to these keepalive probes unless the segment contains data. Some systems can be configured to send one garbage byte of data in the probe to elicit the response. The garbage byte causes no harm, because it's not the expected byte (it's a byte that the receiver has previously received and acknowledged), so it's thrown away by the receiver. Other systems send the 4.3BSD-style segment (no data) for the first half of the probe period, and if no response is received, switch to the 4.2BSD-style segment for the last half.

We then disconnect the cable and expect the next probe, 2 hours later, to fail. When this next probe takes place, notice that we never see the TCP segments on the cable, because the host is not responding to ARP requests . We can still see that the client sends 10 probes, spaced 75 seconds apart, before giving up. We can see from our interactive script that the error code returned to the client process by TCP gets translated into "Connection timed out," which is what happened .

Other End Crashes and Reboots

In this example we'll see what happens when the client crashes and reboots. The initial scenario is the same as before, but after we verify that the connection is up, we disconnect the server from the Ethernet, reboot it, and then reconnect it to the Ethernet. We expect the next keepalive probe to generate a reset from the server, because the server now knows nothing about this connection. Here is the interactive session:

 bsdi %  sock K svr echo   -K to enable keepalive option   hi there   type this to verify connection is up  hi there  and this is echoed back from other end   here server is rebooted while disconnected from Ethernet  read error: Connection reset by peer 

Figure 23.2 shows the tcpdump output. (We have removed the connection establishment and the window advertisements.)

Figure 23.2. Keepalive example when other host has crashed and rebooted.
graphics/23fig02.gif

We establish the connection and send 9 bytes of data from the client to the server (lines 13). Two hours later the first keepalive probe is sent by the client, and the response is a reset from the server. The client application prints the error "Connection reset by peer," which makes sense.

Other End Is Unreachable

In this example the client has not crashed, but is not reachable during the 10-minute period when the keepalive probes are sent. An intermediate router may have crashed, a phone line may be temporarily out of order, or something similar.

To simulate this example we'll establish a TCP connection from our host slip through our dialup SLIP link to the host vangogh.cs.berkeley.edu, and then take the link down. First, here is the interactive output:

 slip %  sock K vangogh.cs.berkeley.edu echo   testing   we type this line  testing  and see it echoed   sometime in here the dialup SLIP link is taken down  read error: No route to host 

Figure 23.3 shows the tcpdump output that was collected on the router bsdi. (The connection establishment and window advertisements have been removed.)

Figure 23.3. Keepalive example when other end is unreachable.
graphics/23fig03.gif

We start the example the same as before: lines 13 verify that the connection is up. The first keepalive probe 2 hours later is fine (lines 4 and 5), but before the next one occurs in another 2 hours, we bring down the SLIP connection between the routers sun and netb. (Refer to the inside front cover for the topology.)

The keepalive probe in line 6 elicits an ICMP network unreachable from the router sun. As we described in Section 21.10, this is just a soft error to the receiving TCP on the host slip. It records that the ICMP error was received, but the receipt of the error does not take down the connection. Eight more keepalive probes are sent, 75 seconds apart, before the sending host gives up. The error returned to the application generates a different message this time: "No route to host." We saw in Figure 6.12 that this corresponds to the ICMP network unreachable error.



TCP.IP Illustrated, Volume 1. The Protocols
TCP/IP Illustrated, Vol. 1: The Protocols (Addison-Wesley Professional Computing Series)
ISBN: 0201633469
EAN: 2147483647
Year: 1993
Pages: 378

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net