Diagnosing TCP Session Problems

Problem

You want to figure out why the BGP session is not being established.

Solution

Start by looking at the current state of the TCP sessions on the router:

	aviva@RouterF> show system connections extensive

Also look at the information in the system logging files:

	aviva@RouterF> show log messages

Check that the TCP session can pass Internet control packets:

	aviva@RouterF> ping tos 0xc0 RouterD

Discussion

When two BGP peers have a problem establishing a BGP session, one of the first indications is that you see BGP hold-time expired error messages on the routers in the routers system logging files. You also see that the State field in the show bgp neighbor command output is not Established and that the State field in the show bgp summary command is Active or Connect, indicating that the BGP session is not established.

The hold-time expired errors usually occur because the TCP session between a pair of peers cannot effectively transmit data between the routers, not because of a problem with BGP itself. When the TCP session doesn work properly, the BGP session times out, and BGP signals the problem by sending hold-time expired messages and generating a BGP Notification message to the remote peer. Notification messages are logged at the system logging severity level warning.

Some of the most frequent causes of hold-time expired errors are MTU issues on a directly connected link, issues related to forwarding of Internet control packets, and IGP failures on IBGP sessions.

Looking at the TCP MTU path behavior, first lets look at the TCP session. By default, a TCP session transmits 576 bytes in a single packet to minimize the chances that the packet will be fragmented at a device along the path to the destination. Most links use an MTU of at least 1,500 bytes. Path MTU discovery, which is disabled by default in the JUNOS BGP, allows BGP to dynamically determine how large the packets can be in a TCP session without being fragmented. This means that BGP tries to use 576-byte packets for the TCP sessions. However, on directly connected EBGP sessions, TCP uses MTU-sized packets. If there is an MTU mismatch between the two sides of the TCP connection, the BGP session cannot be established. One workaround is to enable path MTU discovery within the BGP group:

	[edit protocols bgp group external ]
	aviva@RouterF# set mtu-discovery 

When path MTU discovery is enabled, the don fragment ( DF) bit is set on all TCP packets sent by the BGP session.

When you are testing session connectivity, in addition to the standard ping command, send packets in which the Internet control CoS bit is set:

	aviva@RouterF> ping tos 0xc0 RouterD

If the QoS parameters are misconfigured on a transit router, TCP connectivity can work for regular best-effort traffic but will break for Internet control traffic. The same behavior can happen when you are testing new software or new PICs.

Another way to get information about the TCP session and what might be malfunctioning is to look at the current state of TCP sessions:

	aviva@RouterF> show system connections extensive | find tcp
	tcp4 0 2 192.168.70.143.23 172.17.28.108.3350 ESTABLISHED
	 sndsbcc: 2 sndsbmbcnt: 256 sndsbmbmax: 266432
	sndsblowat: 2048 sndsbhiwat: 33304
	 rcvsbcc: 0 rcvsbmbcnt: 0 rcvsbmbmax: 463360
	rcvsblowat: 1 rcvsbhiwat: 57920
	 iss: 2677798142 sndup: 2677853922 sndcc: 0
	 snduna: 2677853922 sndnxt: 2677853924 sndwnd: 57920
	 sndmax: 2677853924 sndcwnd: 65535 sndssthresh: 1073725440
	 irs: 1577022682 rcvup: 1577023284 rcvcc: 0
	 rcvnxt: 1577023292 rcvadv: 1577081212 rcvwnd: 57920
	 rtt: 200130618 srtt: 301 rttv: 12
	 rttmin: 100 duration: 0 mss: 1448
	 flags: REQ_SCALE RCVD_SCALE REQ_TSTMP RCVD_TSTMP [0x1e0]

Also, use the information in the system logging files, which is very extensive and is similar to the output of the show system connections extensive command:

	Aug 24 13:15:46 RouterF rpd[2797]: bgp_traffic_timeout: NOTIFICATION sent to 192.168.
	14.1 (Internal AS 3356): code 4 (Hold Timer Expired Error), Reason: holdtime expired
	for 192.168.14.1 (Internal AS 3356), socket buffer 
sndcc: 0 
rcvcc: 0 TCP state: 4,
	 
snd_una: 1404695285 
snd_nxt: 1404695285 
snd_wnd: 16384 
rcv_nxt: 4086106368 
rcv_adv:
	4086157473, keepalive timer 0

You can learn a lot of information about the TCP connection from the socket buffer information in the system logging message, which is a subset of BSD transmission control block ( TCB) parameters:


sndcc

Bytes on send buffer. A full send buffer typically means that packets from this host are not being acknowledged.


rcvcc

Bytes on receive buffer. Expect 0 bytes here because RPD should not declared a hold time expired if information is available about the buffer.


snd_una


snd_nxt

The difference between these two (snd_nxtsnd_una) is the amount of unacknowledged data on the TCP session.


snd_wnd

Size of the window advertised by the peer.


rcv_adv


rcv_nxt

The difference between these two (rcv_advrcv_nxt) is the size of the window advertised by the local TCP stack.

It is important to try to collect the information on both sides of the session. This gives an indication about whether the data path failure is unidirectional, bidirectional, or dependent on packet size.

If you are seeing hold-time expired errors between IBGP peers, check the IGP logs. If this correlates to a link failure in your IGP, this should probably be your starting point for diagnostics.

See Also

For information about BSD TCBs, see TCP/IP Illustrated (Addison-Wesley).


Router Configuration and File Management

Basic Router Security and Access Control

IPSec

SNMP

Logging

NTP

Router Interfaces

IP Routing

Routing Policy and Firewall Filters

RIP

IS-IS

OSPF

BGP

MPLS

VPNs

IP Multicast



JUNOS Cookbook
Junos Cookbook (Cookbooks (OReilly))
ISBN: 0596100140
EAN: 2147483647
Year: 2007
Pages: 290
Authors: Aviva Garrett

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net