11.5. Project 11.3. Trace Both Ends of a Call Setup with Log Comparison
220.127.116.11 What you need for this project:
To set up an interop problem in the test lab, we'll configure Asterisk to require an MD5 secret for a SIP peer, without informing the SIP client, X-Lite, what the MD5 secret is. This will simulate lack of support for MD5 SIP authentication by the calling endpoint, a common incompatibility . You can require an MD5 secret for any SIP peer by finding the section for the peer in /etc/asterisk/sip.conf and adding:
whatever signifies any string you want, since our purpose is merely to require MD5 support from the X-Lite SIP client. Once you've added this token to the sip.conf file, issue a reload at the Asterisk CLI and then launch X-Lite on a nearby PC. Attempt to place a call using X-Lite configured for the extension number whose SIP peer you just gave the md5secret token.
Using X-Lite's Diagnostic Log window, you can see the REGISTER method that the SIP client is sending to the Asterisk server:
SEND TIME: 158566682 SEND >> 10.1.1.10:5060 REGISTER sip:10.1.1.10 SIP/2.0 Via: SIP/2.0/UDP 10.1.1.201: 5060;rport;branch=z9hG4bKA98898C43B1E11D99DF4000A958240F6 From: Ted Wallingford <sip:firstname.lastname@example.org>;tag=1055640066 To: Ted Wallingford <sip:email@example.com> Contact: "Ted Wallingford" <sip:firstname.lastname@example.org:5060> Call-ID: 383F74D93AA611D99DF4000A958240F6@10.1.1.10 CSeq: 62767 REGISTER Expires: 1800 Max-Forwards: 70 User-Agent: X-Lite release 1103m Content-Length: 0
The call ID, shown in bold, is the SIP user agent's unique identifier for this particular signaling exchange. It will be used for the duration of the exchange, both in requests and responses, regardless of the outcome. In this example, the outcome is a 403 Forbidden response from the registrar because of failed registration. Of course, in a real troubleshooting scenario, we still wouldn't know why the registration is failing. We would know only that, for one reason or another, we aren't authorized to register. Here's the Forbidden response as recorded in X-Lite's diagnostic log:
RECEIVE TIME: 158747423 RECEIVE << 10.1.1.10:5060 SIP/2.0 403 Forbidden Via: SIP/2.0/UDP 10.1.1.201:5060;branch=z9hG4bK15429F973B1F11D99DF4000A958240F6 From: Ted Wallingford <sip:email@example.com>;tag=1055640066 To: Ted Wallingford <sip:firstname.lastname@example.org>;tag=as0e35e19d Call-ID: 383F74D93AA611D99DF4000A958240F6@10.1.1.10 CSeq: 62774 REGISTER User-Agent: Asterisk PBX Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER Contact: <sip:email@example.com> Content-Length: 0
The call ID matches the request with the response. Up until this point, the reason for the 403 Forbidden result is still unknown. The next troubleshooting step is to compare X-Lite's version of the failure, which reveals little, with Asterisk's version, which we hope will reveal more.
Depending on Asterisk's logger.conf configuration, the level of granularity and location of the logs may vary (Project 10.3 describes log tweaking). In this instance, let's say we've got a highly detailed Asterisk message log at /var/log/asterisk/full . By issuing this command, we can quickly find all the entries that have the call ID that matches X-Lite's diagnostic log:
# cd /var/log/asterisk # cat full grep 383F74D93AA611D99DF4000A958240F6@10.1.1.10 more
You can use copy and paste to bring the call ID from X-Lite into a terminal window so you don't have to type that length call ID string. The output of the cat command will look something like this:
Nov 20 12:23:17 DEBUG[ 1090935488 ]: Auto destroying call '383F74D93AA611D99DF4000A958240F6@10.1.1.10' Nov 20 12:23:17 VERBOSE[ 1090935488 ]: Destroying call '383F74D93AA611D99DF4000A958240F6@10.1.1.10'
By itself, this output doesn't tell us much, only that Asterisk decided to terminate the call associated with that call ID at some point during the exchange. But this isn't a dead end in the troubleshooting process, because this output reveals Asterisk's identifier for the Linux thread handling this call, 1090935488, shown in bold. (Non-Linux Asterisk uses the PID instead of the thread ID.)
Asterisk doesn't use the SIP call ID in log entries nearly as often as it uses the thread identifier, so we'll need to comb the log again for entries that contain the thread identifier. Entries that occur with the same thread identifier in rapid succession within 1 to 2 seconds following the instance of the SIP call ID are likely to be the entries we're looking for. If you have the time frame of the event you're troubleshooting narrowed down to a 2 or 3 minute window, so you can further grep the output with a reference to the time window in question:
# cat full grep 1090935488 grep "Nov 20 12:2" more ^^^^^^^^^^^
Now, you'll be able to examine all the output related to thread 1090935488:
Nov 20 12:26:01 VERBOSE: 12 headers, 13 lines Nov 20 12:26:01 VERBOSE: Using latest request as basis request Nov 20 12:26:01 VERBOSE: Sending to 10.1.1.201 : 5060 (non-NAT) Nov 20 12:26:01 VERBOSE: Found RTP audio format 0 Nov 20 12:26:01 VERBOSE: Found RTP audio format 8 Nov 20 12:26:01 VERBOSE: Found RTP audio format 3 Nov 20 12:26:01 VERBOSE: Peer audio RTP is at port 10.1.1.201:8000 Nov 20 12:26:01 DEBUG: Peer audio RTP is at port 10.1.1.201:8000 Nov 20 12:26:01 VERBOSE: Found description format pcmu Nov 20 12:26:01 VERBOSE: Found description format pcma Nov 20 12:26:01 VERBOSE: Found description format gsm Nov 20 12:26:01 VERBOSE: Found description format telephone-event Nov 20 12:26:01 VERBOSE: Capabilities: - us 0x60e(GSMULAWALAWSPEEXILBC), - peer audio=0x60e(GSMULAWALAWSPEEXILBC)/video=0x0(EMPTY), - combined 0x60e(GSMULAWALAWSPEEXILBC) Nov 20 12:26:01 VERBOSE: Non-codec capabilities: us - 0x1(G723), peer - 0x1(G723), combined - 0x1(G723) Nov 20 12:26:01 DEBUG: Setting NAT on RTP to 0 Nov 20 12:26:01 VERBOSE: Found user '203' Nov 20 12:26:01 NOTICE: Failed to authenticate user Ted Wallingford <sip:firstname.lastname@example.org>;tag=435868921
The first dozen lines of output are Asterisk's logging of the SDP attributes sent when X-Lite tried to place a call with a SIP INVITE method. The giveaway that our 403 Forbidden is caused by an authentication problem is in the last two lines of the output.
Here, we have it narrowed down to an invalid password, a mismatched MD5 secret, or nonsupport of MD5. We've eliminated an invalid username as the root cause, because the username is reportedly valid. To determine whether the problem is related to MD5 or just a plain-old password mismatch, we need to find out whether the SIP peer, in this case extension 203, is required to use an MD5 secret. Since we set up this failure on purpose, we know that it is. (Without MD5, the same steps could have been used to troubleshoot a simple password mismatch.) In a less concocted troubleshooting situation, you would go to /etc/asterisk/sip.conf and check for an MD5 requirement for the peer labeled  .
To resolve the simulated incompatibility, remove the md5secret token from sip.conf and attempt to place a call with the X-Lite softphone again.
11.5.2. Troubleshooting Quality-of-Service Issues
Not all troubleshooting situations involve signaling. As you may have read in prior chapters, noise problems and dropouts stemming from jitter and packet loss are cause for troubleshooting. Fortunately, the standards for sound transmissionRTP and codecsare simple compared to call signaling, so troubleshooting sound issues is generally easier.
That's not to say all sound-related issues are a snap to dismiss. Sound issues are affected by many disciplines of networking: switching, routing, QoS, and packetization. The way congestion on a routed link shows up is in dropouts, or in busy signals if you're using a QoS solution. On a switched link, like Ethernet, busy signals may occur long before dropouts are perceptible. So symptoms are different on different links.
As with signaling, good diagnostic logging is valuable for troubleshooting. Packet analysis is less useful when examining sound quality problems. One can certainly identify jitter using packet capture, but it's just easier to identify jitter across IP links using tools like traceroute and pathping , which succinctly show the variances in round-trip latency on every hop of a route. If you need to dig deeper, capture tools like Ethereal and its commercial equivalents can assist you in isolating jitter at the RTP level.
18.104.22.168 Commercial packet analysis tools
If firewalls or other filters block ICMP or TCP traffic along the call path you're troubleshooting, then you may use a packet capture tool with advanced VoIP capabilities, like Network General's ubiquitous Sniffer or WildPacket's EtherPeek VX. Both permit analysis of signaling and media packets, readily understand the structure of RTP, can identify jitter, and can reveal sources of congestion on your network.
22.214.171.124 When the going gets tough, call in the end users
The human ear is a great instrument for troubleshooting. You can carry an IP phone from one server closet to the next, listening to calls to the same destination from different points along the call path. This process of elimination will help you narrow down a group of potential congestion points until you've identified the one that's to blame. On large, expansive networks, where this can't be done by a single person, you could enlist help from other people within the organization. Even end users are pretty reliable when it comes to assisting with this type of troubleshooting. Ask them to make a call that you expect to traverse the call path you're looking at, and then ask them how it sounds or if they get a busy signal. Both can be indicators of congestion, though a busy signal will happen only in networks with QoS.