Section 7.3. SIP | Switching to VoIP

7.3. SIP

The Session Initiation Protocol (SIP) was developed by the Internet Engineering Task Force as a way of signaling multiuser distributed telephony and messaging applications on an IP network. SIP has garnered much praise from IT professionals, while suffering some criticism from traditional telecommunications people. The main reason for its less-than -perfect repute with telecom pros is its origin outside the telecom world. But many telecom guys have had to forgive this, because they're learning that SIP has almost no shortcomings when compared to its ITU-inspired cousin.

The essential duties and formulaic pieces of SIP are the same as H.323. That is, there are VoIP endpoints of varying capabilities, and there are servers that participate in the signaling process and establish policy for the voice network. Unlike H.323, however, SIP is far more extensible. It is more than just a set of voice and video telephony protocols. Rather, it's a packaging framework for all types of message-based applications, from intercom calling to instant messaging and AV services.

Companies like Broadvox, Voicepulse, Broadvoice, Packet8, and others have emerged as frontier providers for dial-tone-style services delivered over the Internet, using SIP as the signaling system. Under these service offerings, consumers can purchase telephone calling capabilities that use the Internet, rather than a POTS line, as the transport for their phone service.

Avaya, Cisco, Siemens, Alcatel, and the major telephony hardware vendors have indicated a strong support attitude about SIP, while some have even backed away from H.323 investments. This bodes well for SIP's future, and there are already more SIP IP phones installed worldwide than there are H.323 ones.

This makes SIP both an easy decision and a challenging one. SIP's extensibility comes by way of a non-telephony mindset.

Traditional telecom engineers have balked about the wordiness, or bulkiness, of SIP's message structure. Instead of using compact, machine-friendly message packets like H.323, SIP uses lengthy, human-readable headers like SMTP or HTTP. Proponents of SIP counter that this human readability makes SIP easier to troubleshoot, and I tend to agree.

SIP is currently in Version 2.0. Its definition is found in RFCs 3261 through 3265. The defined purpose of SIP is to coordinate and facilitate monitoring of media sessions on the network. It supports a variety of addressing schemes and can be designed as a centralized or distributed topology.

7.3.1. SIP Nodes

SIP endpoints and servers are called nodes . A SIP phone is a node. SIP phones can communicate directly with each other in order to establish media sessions, just as H.323 terminals can establish direct channels. But more often than not, especially in an enterprise setting, SIP is used with a SIP server. SIP phones normally report to a dedicated SIP server node called a registrar upon boot-up.

7.3.1.1 SIP registrar

The SIP registrar is a database server that communicates with SIP nodes in order to collect, store, and disperse information about the whereabouts of SIP users. When a SIP node registers with a registrar, it tells that registrar how to get hold of the user , specifically what IP address and port to use for future SIP communication. You could think of the registrar as a router, because its main purpose is to give advice on how to reach SIP users, just as a TCP/IP router's purpose is to give advice on how to reach other networks.

7.3.1.2 URIs

SIP endpoints can be referenced using Uniform Resource Indicators, but so can SIP users. Consider this URI:

 sip:lerxt@sip.bytor.com

This convention indicates both the user to be contacted and the server that is expected to know the address of that user's SIP endpoint. In this case, the user is lerxt and the server is sip.bytor.com . Secure SIP URIs, that is, those that indicate an encrypted signaling connection, use the sips: prefix instead of sip: . Encryption of SIP signals, if desired, occurs by way of Transport Layer Security, defined in RFC 2246.

A SIP URI doesn't always correspond to a single phone. If a user is available at one of several phones, then all of those phones can ring simultaneously , or in a specific sequence, based on the handling server's configuration. Most SIP registrars support simultaneous registration of the same user at multiple phones, and the most common way of handling this situation is to ring them all when a call is received for that user.

7.3.1.3 SIP methods and responses

SIP signals fall into 10 categories called methods . Each method accomplishes a different function for SIP:

INVITE: This method is used to start sessions and advertise endpoint capabilities.
ACK: This method is used to acknowledge to the called SIP peer that an INVITE has succeeded.
BYE: This method occurs when the call is completed, that is, one user at a minimum wishes to end the call.
CANCEL: This method is used during attempts to override a prior request that hasn't yet been completed.
OPTIONS: This method is used to query a SIP peer for its capabilities information, without actually establishing a media channel.
REGISTER: This method notifies the SIP server at which endpoints a particular user can be reached.
INFO: This method is used to transmit telephony application signals through the SIP signaling path ; these signals can include dialed digits.
PRACK: This method (Provisional ACK) is used to notify an endpoint of intent to set up a complex call without actually providing an ACK. PRACK is the SIP equivalent of "all is well."
SUBSCRIBE: This method provides a way of establishing event handlers within SIP telephony applicationsi.e., "Tell me when Bob misses a call" or "Tell Bob when I am registered with the server."
NOTIFY: This method delivers messages between endpoints as events occuri.e., "Bob missed a call."

When a call must be started, ended, or altered , a SIP method is employed. The SIP methods in the preceding list are similar in concept to the HTTP methods GET and POST, and like HTTP, SIP expects response codes when it sends a method. SIP's numeric response codes are three digits long and break down into six categories:

A complete list of SIP responses is found in Appendix A.

Project 11.2 shows how to use a packet capture tool to observe SIP methods and responses.

Typically, a SIP caller initiates a method directed to a SIP callee, and that SIP callee initiates a response, according to the success or failure of the caller's method.

In Figure 7-9, you can see that a (highly simplified) call from an Internet host (A), in the form of a SIP INVITE, to 5150@oreilly.com, would ordinarily result in a 200 OK response, clearing the way for the call to begin. Now, if the INVITE method specified a SIP peer whom the SIP server didn't know how to reach, a 404 Not Found response would be in order, as in Figure 7-10.

Figure 7-9. A call to 5150@oreilly.com would normally result in a 200 OK response, if 5150@oreilly.com were registered on the SIP registrar labeled B

A SIP INVITE header looks like this:

 INVITE sip:5150@oreilly.com SIP/2.0     Via: SIP/2.0/UDP oreilly.com:5060;branch=9889gg1424     Max-forwards: 7     To: 5150 <  5150@oreilly.com  >     From: 1984 <  1984@vh.com  >

Figure 7-10. In the same setup as Figure 7-9, a call to 1138@oreilly.com, which is not registered in the registrar labeled B, will result in a 404 response

7.3.1.4 SIP proxies

Calls from one SIP endpoint to another can be considered local if they are both registered with the same SIP registrar, even if the physical endpoints are on different continents. The point is, they are on the same domain and are therefore peers on a local network of sorts. Nonlocal calls, however, are routed through specialized SIP server software known as a SIP proxy.

A SIP proxy is a server that routes or redirects SIP INVITE methods on behalf of one or more domains, just as a web server provides responses to HTTP methods for certain domains. So, when an incoming call from a foreign network is recognized, the SIP proxy's job is to connect it to the called user's endpoint, if possible.

Outbound SIP proxies serve the task of connecting calls, but on behalf of a local network of SIP users. Many users may share the same SIP proxy because they work in the same office or perhaps because they subscribe to the same SIP dial-tone service provider. Outbound SIP proxies are often used to overcome network communications problems posed by NAT firewalls.

Some local calls may benefit from being routed through a SIP proxy, too. Forcing even local SIP endpoints to use a SIP proxy allows for easy enforcement of a dial-plan, greater administrative control over the voice network, and the ability to do centralized telephony applications such as call recording.

7.3.1.5 SIP user agent elements

All SIP proxies and endpoints are comprised of two key software elements, the user agent client and user agent server (UAC and UAS). All SIP devicesbe they softphones, hardphones, voice mail servers, or full-blown PBX serversmust be able to speak the SIP protocol, and the UAC and UAS elements are their mouthpieces. The UAC sends methods and receives responses, so its logical equivalent in HTTP is the web browser. The UAS receives SIP methods, processes them, and returns responses, so it's more like a web server. In varying degrees of completeness, all SIP endpoints and servers have both a UAC and a UAS.

In Figure 7-11, you can see the signaling process for a nonlocal call. This particular example uses an inbound call, which is fielded by a proxy server. By the time the SIP signaling begins, the calling endpoint already knows what host to contact for calls destined for the receiving endpoint's domain. This occurs by way of a DNS lookup for the hostname in the form sip.domain.com . The calling endpoint can then contact the proxy server for the domain in question, and send it an INVITE method.

Figure 7-11. The SIP signaling process for a call from endpoint A to endpoint C through proxy B

The proxy server may immediately respond with an informational Trying response, or it may save the Trying response until after it has forwarded the proxied INVITE method to the appropriate endpoint on the local network. Incidentally, the way the proxy knows which local endpoint to contact is by performing a database lookup with the SIP registrar. If the user being called doesn't exist in the registrar, then the proxy will return a 404 Not Found response.

In this example, though, the user, jake@oreilly.com, does exist and is able to respond to the proxied INVITE method. His endpoint's responses are proxied back to the caller, and ultimately, the 200 OK response is sent, indicating the call is clear to proceed. One of the most important pieces of this startup signaling process occurs during the INVITE methods and 200 responses: SDP capabilities negotiation.

7.3.1.6 SIP redirect

When a SIP server responds to a calling endpoint's INVITE method with a 3xx response, that SIP server is redirecting the calling endpoint to a different SIP server. The calling endpoint should then contact that server with an INVITE method for further assistance in connecting the media stream. This feature is not implemented on all systems that support SIP. In fact, where complex, signaling-neutral dial-plan programming is available (like in Asterisk), SIP redirection isn't always necessary. The use of SIP redirects is more common in large, SIP-only networks with multiple servers, such as those that span the Internet.

7.3.1.7 Session Description Protocol

SDP is the de facto session capabilities protocol of SIP, similar to the H.245 protocol in H.323. It is defined in RFC 2327. When a call is placed from one SIP endpoint to another, an SDP capabilities construct is sent as text payload in a SIP INVITE, following the SIP packet header Content-type that indicates an SDP message is to follow. Here's a sample SDP payload:

 v=0     o=HanSolo 7575 440 IN IP4 10.1.1.103     c=IN IP4 10.1.1.103:16385     m=audio 16385 RTP/AVP 1     a=rtpmap:1 G726/8000

This particular construct is requesting a G.726 media channel originating from user HanSolo at 10.1.1.203 using RTP for media packaging on UDP port 16385.

Using SDP, the calling endpoint can request certain codecs, sampling rates, or even a packaging protocol other than RTP (this is very rare, though). In the previous SDP block, the v token indicates the version of SDP being used, though SIP doesn't care, as SIP could theoretically use any version. The o token is a string of identifiers that uniquely name this SDP request, often including an NTP timestamp and the IP address and protocol designation of the sender. The c token tells which IP address to use for the media channel.

The m token describes the UDP port number and media framing protocol (RTP in this example), followed by a numeric identifier for this framing capability, called an RTP profile. More than one m token definition may be sent if the requesting SIP endpoint is advertising more than one set of RTP capabilities. The a token tells the RTP profile identifier, codec, and sample rate to use for the media channel. For an exploration of SDP messages captured by the Ethereal packet sniffer, see Chapter 11.

In Asterisk, the status of SIP peers can be displayed at the Asterisk command line, using sip show peers and sip show users . Calls in progress can be tracked using Asterisk Manager ( astman ).

7.3.1.8 Real-Time Streaming Protocol

Unlike H.323, the SIP protocol family provides a built-in recommendation for streaming prerecorded audio and video, in this case the RTSP protocol. This is the same protocol used by RealOne and similar media player applications. It's defined by RFC 2326.

7.3.1.9 SIP packet encoding

While H.323 uses the obscure ASN.1 (abstract syntax notation) encoding, SIP uses plain text. More specifically, SIP uses a text-based conversational approach for its signaling messages, whereas H.323 uses the Q.931 ISDN approachone that is, by all accounts, meagerly understood outside telecommunications engineering circles.

Although SIP has many functional equivalents in H.323, it's fair to say that SIP's approach is more distributed, and certainly more Internet-like, while H.323's is more PSTN-like or mainframe-like. Table 7-2 illustrates this concept.

Table 7-2. Comparison of SIP, H.323, and IAX protocol families

Function/Characteristic	SIP	H.323	IAX
Endpoint discovery and admission	SIP REGISTER methods	RAS Protocol	IAX REG Control Frames
Call setup and teardown	SIP INVITE methods	H.225 Protocol	IAX NEW and HANGUP Control Frames
Capabilities negotiation, codec selection, and media session port selection	Session Definition Protocol	H.245 Protocol	IAX Capability Information Meta Frame
Packetization and sound sample transmission	RTP/RTCP Protocol	RTP/RTCP Protocol	IAX Voice/Data Full and Mini Frame
Streaming of recorded audio and video	RTSP	None recommended	None recommended
Frame encoding	Text (ASCII & similar)	ASN.1	Binary/proprietary
Messaging approach	HTTP-like	ISDN-like / Q.931	Proprietary
SoftPBX call path is called a...	Proxy	Gatekeeper-routed	SoftPBX
Call-routing reference device is called a...	Registrar	Gatekeeper	Server
Independent call path is called...	Redirect	Directed signaling	Direct signaling
PSTN interface approach	None recommended	H.323 gateway	None recommended
Encryption of signaling messages	TLS/TCP	None recommended	A work in progress
Endpoint identification	SIP URI, email address, E.164 address, or alias	E.164 address	Email address, E.164 address, or alias
Connections through firewalls	Gatekeeper/softPBX call path	Proxy/softPBX call path	No proxy needed
Multiplexed trunks	Yes	Yes	Yes
UDP port number	5060/5061	1503,1720,1731	5036 (v1)

7.3.1.10 SIP versus H.323: the great debate

H.323 is older than SIP, and more considerate of legacy infrastructure, too. It defines standard procedures and best practices for interfacing with old-school telephony technologypractices like the use of gateways with built-in support of legacy protocols like ISDN and FXO/FXS.

SIP makes no such provisions, but is still the favorite among those schooled in Internet thought: it's extensible and reusable and does a whole lot more for telephony apps, ultimately, than H.323. SIP allows multiple endpoints to register the same alias, allows freedom from E.164, and enables other, non-voice applications like instant messaging and presence. For these reasons, some in the industry have pronounced SIP the winner of the battle for telephony signaling.

SIP doesn't define gateways for interaction with the PSTN, so a legacy-aware signaling system like MGCP, MEGACO, or H.323 is often used alongside SIP in order to facilitate legacy gateways.

The fact is, both protocol suites are very much necessary. Parts of the H.323 recommendationRTP and PSTN gateways in particularare in use by SIP networks so the importance of H.323's features is obvious.

One day, SIP may contain some legacy interfacing of its own, but for now, H.323 fills that role quite aptly. All the big VoIP equipment vendors use its recommendations in order to interface with legacy systems like the PSTN. SIP, conversely, was built for the new networkthe Internet network. Some mix of H.323 or MEGACO/H.248 (covered later) is in order if your SIP system plans on talking to a traditional telephony system.