21.4 Standards for Mobile Video Telephony


21.4 Standards for Mobile Video Telephony

3GPP has specified standards for mobile video telephony, taking into account the nature of the mobile network channel. In fact, as introduced in the previous section, two different types of channels can enable mobile video telephony applications: circuit-switched and packet-switched channels. Following this dual approach, 3GPP has defined two different sets of standard specifications:

  1. Specifications for CS mobile video telephony are based on the ITU-T H.324 standards for video telephony terminals over circuit-switched channels. H.324-based terminals also can be implemented over GSM-based CS channels (HSCSD, ECSD).

  2. Specifications for PS mobile video telephony are based on the IETF SIP standard for video telephony over packet-switched channels.

Figure 21.2 summarizes the mapping between the mobile network channels and the standards for mobile video telephony defined in 3GPP.


Figure 21.2: Standards for mobile video telephony.

A more-detailed description of the standards for mobile video telephony for CS and PS networks is given in the following sections.

21.4.1 Circuit-Switched Mobile Video Telephony

H.324 terminals for 3GPP circuit-switched mobile video telephony are essentially ITU-T. H.324 terminals with Annex C [6] and with modifications specified by 3GPP [7] since Release '99. In 3GPP, these are called 3G-324M terminals.

The system architecture of a 3G-324M terminal is depicted in Figure 21.3. [8] The mandatory elements of this architecture are a wireless interface, the H.223 multiplexer with Annex A and B, [9] and the H.245 system control protocol (version 3 or successive). [10] 3G-324M terminals are specified to work at bit rates of at least 32 kbps.

click to expand
Figure 21.3: System architecture of 3G-324M terminals.

We will give an overview of the basic building blocks of a 3G-324M terminal, considering also some implementation guidelines, as described in 3GPP TSGS-SA. [11] The reader interested in the differences between H.324 and 3G-324M terminals can find more information in References [12] and [13]. Here we will not emphasize these differences.

21.4.1.1 Media Elements

3G-324M terminals can support a wide set of media. They can be either continuous media (speech and video) or discrete media (real-time text). Among the former set, the following codecs can be supported in a mobile terminal:

  • AMR (Adaptive MultiRate) narrowband is the mandatory speech codec for 3G-324M terminals, [14] if speech is supported. Speech is encoded at 8 kHz sampling frequency and at eight different bit rates ranging from 4.75 to 12.20 kbps.

  • G.723.1 is the recommended speech codec supported. [15] It encodes speech at two bit rates, 5.3 and 6.3 kbps. The G.723.1 codec is needed if inter-operation against GSTN (General Switched Telephone Networks) is a requirement. [16]

  • H.263 video Profile 0 Level 10 is the mandatory codec, if video is supported. [17]

  • MPEG-4 Visual is an optional codec that can be supported at Simple Profile Level 0. [18]

  • H.261 is another optional video codec [19] that can be supported by 3G-324M terminals.

The discrete media defined in 3GPP specifications of circuit-switched video telephony terminals are in the framework of the optional user data application:

  • T.120 [20] is a protocol that allows multipoint data conferencing for transfer of data, images, and sharing of whiteboard and applications.

  • T.140 [21] is a protocol that allows real-time text conversation between two 3G-324M terminals. Text sessions can be opened in a stand-alone fashion or simultaneously with speech, video, and other data applications. Further information about this capability is available in Reference. [22].

21.4.1.2 System Control and Multiplexing

In this section a general description of the system control and the multiplexing is given. Figure 21.4 shows a more detailed view of the 3G-324M protocol stack.

click to expand
Figure 21.4: 3G-324M protocol stack.

The control protocol H.245 [23] provides end-to-end signaling for proper operation of a 3G-324M terminal, capability exchange, and messages to open and fully describe the content of logical channels. Most of the control signaling occurs at the beginning and at the end of the terminal call. The needed bandwidth for H.245 signaling is always allocated on-demand by the H.223 multiplexer. [24] This ensures that most of channel bandwidth is effectively used by the media.

H.324 Annex C [25] introduces also the Control Channel Segmentation and Reassembly Layer (CCSRL), which is used to split large control channel packets. The segmentation is required because successful transmission of large packets at high error rates may be difficult, and the connection set up may even fail without CCSRL.

Control messages can make use of retransmission for providing guaranteed delivery. H.324 uses the (Numbered) Simple Retransmission Protocol, or (N)SRP, [26] for this functionality.

The multiplex protocol H.223 [27] multiplexes audio, video, data, and control streams into a single bit stream, and demultiplexes the received bit stream into separate bit streams. H.223 should support at least 32-kbps speed toward the wireless interface. However, also lower bit rates are possible, especially over GSH-based channels (HSCSD, ECSD). The multiplexer consists of an adaptation layer (AL) that exchanges information between the higher layers (i.e., audio/video codecs and system control), and a lower layer called the multiplex layer (MUX) that is responsible for transferring information received from the AL to the eventual mobile multilink layer and the physical layer(s). The AL handles the appropriate error detection and correction, sequence numbering, and retransmission procedures for each information stream. Three different ALs are specified in the H.223 Recommendation, each targeted to a different type of data:

  1. The AL1 adaptation layer is designed primarily for transfer of data or control information, which is relatively delay insensitive but requires full error correction. However, AL1 does not provide any error control or retransmission procedure, but it relies on higher layers (i.e., (N)SRP) for this functionality. AL1 works in framed (AL1F) and unframed (AL1U) mode. The former is used for transfer of control data, while the latter is used for user data transfer, such as chat-data or other T.120- or T.140-enabled applications.

  2. The AL2 adaptation layer is intended primarily for digital audio, which is delay sensitive, but may be able to accept occasional errors with only minor degradation of performance. AL2 receives data from its higher layer (i.e., an audio codec) and transfers it to the MUX layer after adding an 8-bit CRC (Cyclic Redundancy Check) and optional 8-bit sequence numbers which can be used to detect missing or misdelivered data.

  3. The AL3 adaptation layer is designed for the transfer of digital video. It appends a 16-bit CRC to the data received from its higher layer (i.e., a video encoder), and it passes information to the MUX layer. AL3 includes optional provision for retransmission and sequence numbering by means of an 8- or 16-bit control field. 3GPP recommends encapsulating one MPEG-4 video packet into an AL3-SDU (Service Data Unit). To avoid additional delays caused by possible retransmissions, video data can be transferred using the AL2 that uses a smaller packet overhead and does not allow retransmission procedures. [28]

The MUX layer is responsible for mixing the various logical channels from the sending ALs (e.g., data, audio, video, and control) into a single bit stream to be forwarded to the physical layer for transmission. All MUX layer packets are delimited using HDLC flags, and include an 8-bit header, which contains, among other data, a 3-bit CRC for error detection. The variable-length information field of each MUX packet can contain 0 or more octets from multiple (segmentable) logical channels. To guarantee error resilience and a low delay, MUX packets are recommended to be between 100 and 200 bytes (for speech data, this means to encapsulate 1 to 3 speech frames into a MUX packet). [29]

To provide higher error resilience for data transmission over mobile networks, four different H.223 multiplexer levels are defined, [30] offering progressively increasing error robustness at the cost of progressively increasing overhead and complexity. The different levels are based on a different multiplexer packet structure:

  • H.223 Level 0 describes the basic functionality as defined in Recommendation H.223. All 3G-324M terminals should be able to interwork using this level.

  • H.223 Level 1 is described in Annex A of Recommendation H.223. The HDLC flag used to delimit multiplex packets in the MUX layer of H.223 is replaced with a longer flag, and HDLC zero-bit insertion (bit stuffing) is not used.

  • H.223 Level 2 is described in Annex B of Recommendation H.223. In addition to the features of H.223 Level 1, a 24-bit (optionally also 32-bit) header describing the multiplexer packet is used. The header includes error protection (using Extended Golay Codes) and packet length fields.

  • H.223 Level 3 is described in Annexes C and D of Recommendation H.223. The level includes the features of H.223 Level 2. Furthermore, additional error protection and other features are provided to increase the protection of the payload. For instance, H.223 Level 3 define changes not only to the MUX layer, but also to the AL layer, so that the various ALs in Figure 21.4 are replaced with more robust ones that make use of Reed-Solomon codes.

Two 3G-324M terminals establish a connection at the highest level supported by both terminals. This ensures the interoperability also with GSTN H.324 terminals. A dynamic level change procedure can be used to adjust error resilience when channel conditions vary during a connection. The levels can be used independently in receiving and transmission directions.

The optional Mobile Multilink Layer (MML) [31] usage has been introduced in Release 4 of 3GPP 3G-324M specifications. It allows the data transfer along up to eight independent physical connections, which provide the same transmission rate, in order to yield a higher aggregate bit rate. The MML provides the split functionality toward the lower protocol stack layers (HSCSD, ECSD, or CS UTRAN mobile networks) and the aggregation functionality toward the upper protocol stack layers.

Call setup issues in circuit-switched networks and capability for HTTP content downloading of 3G-324M terminals are not addressed here. The interested reader can find additional details respectively in Curcio and coworkers [32] and Annex I of ITU-T Recommendation H.324. [33]

21.4.2 Packet-Switched Mobile Video Telephony

Mobile video telephony applications have been included in the framework of packet-switched conversational multimedia applications of 3GPP Release 5 specifications. A conversational multimedia application is any application that requires very low delays and error rates. For instance, a Voice over IP (VoIP) application or a one- or two-way multimedia application with the mentioned quality requirements belongs to this category.

Release 5 3GPP specifications for video telephony are tightly connected to the 3GPP network specification. In fact, the call control mechanism in the IP Multimedia Subsystem (IMS) of 3GPP Network Release 5 is based on the SIP protocol defined by IETF. This is the same protocol used for the control plane of mobile videophones, defined in the framework of packet-switched conversational multimedia applications in 3GPP. Figure 21.5 shows the protocol stack for PS mobile videophones. In the next sections, a brief description of the codecs and protocols depicted in Figure 21.5 will be given.

click to expand
Figure 21.5: Protocol stack for PS conversational multimedia applications.

21.4.2.1 Media Elements

The codecs and payload formats used for mobile video telephony are described in the specification. [34] Media either can be continuous (speech and video) or discrete (real-time text). For interoperability issues, 3GPP has ensured that the mandatory codecs for PS video telephony are the same codecs defined for CS video telephony (3G-324M). However, different codecs than the mandatory or recommended ones can be used, and these must be signaled and negotiated through SIP/SDP.

The codecs for continuous media are:

  • AMR narrowband is the mandatory speech codec, [35] if speech is supported in PS videophones. AMR speech is packetized using the payload format described in Sjoberg et al. [36]

  • AMR wideband is the mandatory speech codec [37] whenever wideband speech is supported in the terminal. AMR wideband speech is packetized according to the payload format in Sjoberg et al. [38]

  • H.263 baseline is the mandatory codec when video is supported. [39] H.263 video is encapsulated following the payload format defined in Bormann et al. [40]

  • H.263 Version 2 Interactive and Streaming Wireless Profile (Profile 3) Level 10 is an optional codec to be supported by the terminals. [41] It provides a better coding efficiency and error resilience in a mobile environment, compared to the baseline H.263, because of the use of the video codec Annexes I, J, K, and T. The packetization algorithm is the same defined for the H.263 baseline. [42]

  • MPEG-4 Visual is an optional codec that can be supported at Simple Profile Level 0. [43] Encapsulation of MPEG-4 video is done according to the payload format defined in Kikuchi et al. [44]

Whenever static media are available in mobile videophone terminals, T.140 is the real-time text conversation standard to be optionally supported [45] for chat applications. Packetization of text data follows the formats defined in Hellstrom. [46]

The protocol used for the transport of packetized media data is the Real-Time Transport Protocol (RTP). [47] RTP provides real-time delivery of media data, including functionalities such as packet sequence numbers and time stamping. The latter allows intermedia synchronization in the receiving terminal. RTP runs on the top of UDP and IPv4/v6.

RTP comes with its control protocol (RTCP) that allows QoS monitoring. Each endpoint receives and sends quality reports to and from the other endpoint. The quality reports carry information such as number of packets sent, number of bytes sent, fraction of packets lost, number of packets lost, and packet interarrival jitter. Further details about RTCP will be given in Section 21.5.

21.4.2.2 System Control

The Session Initiation Protocol (SIP) defined in IETF [48] is an application layer control protocol for creating, modifying, and terminating sessions with one or more participants. SIP performs the logical bound between the media streams of two video telephony terminals. As shown in Figure 21.5, SIP can run on the top of TCP and UDP (other transport protocols also are allowed). However, UDP is assumed to be the preferred transport protocol in 3GPP IPv4- or IPv6-based networks. [49]

SIP makes use of the Session Description Protocol (SDP) [50] to describe the session properties. Among the parameters used to describe the session are IP addresses, ports, payload formats, types of media (audio, video, etc.), media codecs (H.263, AMR, etc.), and session bandwidth.

A simple IETF SIP signaling example between two video telephony terminals is presented in Figure 21.6.

click to expand
Figure 21.6: Call setup and release using SIP.

A SIP call setup is essentially a three-way handshake between caller and callee. For instance, the main legs are INVITE (to initiate a call), 200/OK (to communicate a definitive successful response) and ACK (to acknowledge the response). However, implementations can make use of provisional responses, such as 100/TRYING and 180/RINGING when it is expected that a final response will take more than 200 ms. 100/TRYING indicates that the next-hop server has received the request and that some unspecified action is being taken on behalf of this call (for example, a database query). 180/RINGING indicates that the callee is trying to alert the user.

After the call has been established, the actual media transfer (speech and video) can take place. The release of the call is made by means of the BYE method, and the successful call release is communicated to the caller through a 200/OK message.

Quality of service of signaling is an important issue when measuring the performance of terminals for mobile video telephony. In Section 21.5 of this chapter we will clarify the concepts of Post Dialing Delay (T1), Answer-Signal Delay (T2), and Call Release Delay (T3) shown in Figure 21.6. The next section addresses SIP signaling in 3GPP networks.

21.4.2.3 Call Control Issues

SIP-based mobile applications based on IETF signaling can be implemented in 3GPP Release '99 and 4 networks. In this case, only the mobile applications resident in the mobile terminals run the SIP protocol, while the network is not aware of it.

A further step has been made in 3GPP Release 5 specifications, where SIP has been selected to govern the core call-control mechanism of the whole IP multimedia subsystem. Here, both the network and the mobile terminal implement the SIP protocol and exchange SIP messages for establishing and releasing calls. This choice has been made to enable the transition toward all-IP mobile networks. The SIP protocol in 3GPP Release 5 networks is more complex than the IETF SIP, because of factors such as resource reservation or the increased number of involved network elements. For a deeper understanding of the call control in 3GPP networks, you are refer to 3GPP. [51], [52], [53] Here we will give an example of SIP signaling for call setup and release between a mobile terminal and a 3GPP network (see Figures 21.7 and 21.8). [54]

click to expand
Figure 21.7: SIP call setup in 3GPP networks.

click to expand
Figure 21.8: SIP call release in 3GPP networks.

The mobile terminal (or UE, user equipment) initiates a call toward the mobile originated (MO) network. The UE sends the first INVITE (1) message to the P-CSCF (proxy-call session control function) that works as a call router toward other network elements and the destination mobile terminal. Before the 180/RINGING (19) message is received by the UE, the messages (11–17) are exchanged mainly to allow resource reservation in the network and PDP context activation between the UE and the network. PRACK messages [55] play the same role as ACK, but they apply to provisional responses (such as 183/SESSION PROGRESS or 180/RINGING) that cease to be retransmitted when PRACK is received (more details about reliability of SIP messages are available in Section 21.5).

In a 3GPP network, the total number of SIP messages exchanged by the UE for establishing a call is 12 (plus resource reservation), while a simple IETF call setup requires 5 SIP messages.

Call release signaling is shown in Figure 21.8. The number of SIP messages exchanged by the UE is 2 (the same number as in IETF SIP call release), plus the required signaling to release the PDP contexts resources. In this scenario, messages (2-3) can occur even before BYE (1) and in parallel with procedure 4 (remove resource reservation).

[6]ITU-T, Terminal for low bit-rate multimedia communication, Recommendation H.324, March 2002.

[7]3GPP TSGS-SA, Codec for circuit switched multimedia telephony service, Modifications to H.324 (Release '99), TS 26.111, v.3.4.0 (2000–12).

[8]3GPP TSGS-SA, Codec for circuit switched multimedia telephony service, General description (Release 4), TS 26.110, v.4.1.0 (2001–03).

[9]ITU-T, Multiplexing protocol for low bit rate multimedia communication, Recommendation H.223, July 2001.

[10]ITU-T, Control protocol for multimedia communication, Recommendation H.245 (Version 8), July 2001.

[11]3GPP TSGS-SA, Codec(s) for circuit switched multimedia telephony service, Terminal implementor's guide (Release 4), TS 26.911, v.4.1.0 (2001–03).

[12]ITU-T, Terminal for low bit-rate multimedia communication, Recommendation H.324, March 2002.

[13]3GPP TSGS-SA, Codec for circuit switched multimedia telephony service, Modifications to H.324 (Release '99), TS 26.111, v.3.4.0 (2000–12).

[14]3GPP TSGS-SA, Mandatory speech codec speech processing functions, AMR speech codec, General description (Release 5), TS 26.071, v.5.0.0 (2002–06).

[15]ITU-T, Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s, Recommendation G.723.1, March 1996.

[16]3GPP TSGS-SA, Codec(s) for circuit switched multimedia telephony service, Terminal implementor's guide (Release 4), TS 26.911, v.4.1.0 (2001–03).

[17]ITU-T, Video coding for low bit rate communication, Recommendation H.263, February 1998.

[18]ISO/IEC, Information technology - Coding of audio-visual objects - Part 2: Visual, 14496-2, 2001.

[19]ITU-T, Video codec for audiovisual services at p x 64 kbits, Recommendation H.261, March 1993.

[20]ITU-T, Data protocols for multimedia conferencing, Recommendation T.120, July 1996.

[21]ITU-T, Protocol for multimedia application text conversation, Recommendation T.140, February 1998.

[22]3GPP TSGS-SA, Global text telephony, Stage 1 (Release 5), TS 22.226, v.5.2.0 (2002–03)

[23]ITU-T, Control protocol for multimedia communication, Recommendation H.245 (Version 8), July 2001.

[24]3GPP TSGS-SA, Codec for circuit switched multimedia telephony service, General description (Release 4), TS 26.110, v.4.1.0 (2001–03).

[25]ITU-T, Terminal for low bit-rate multimedia communication, Recommendation H.324, March 2002.

[26]ITU-T, Terminal for low bit-rate multimedia communication, Recommendation H.324, March 2002.

[27]ITU-T, Multiplexing protocol for low bit rate multimedia communication, Recommendation H.223, July 2001.

[28]3GPP TSGS-SA, Codec(s) for circuit switched multimedia telephony service, Terminal implementor's guide (Release 4), TS 26.911, v.4.1.0 (2001–03).

[29]3GPP TSGS-SA, Codec for circuit switched multimedia telephony service, Modifications to H.324 (Release '99), TS 26.111, v.3.4.0 (2000–12).

[30]ITU-T, Multiplexing protocol for low bit rate multimedia communication, Recommendation H.223, July 2001.

[31]ITU-T, Terminal for low bit-rate multimedia communication, Recommendation H.324, March 2002.

[32]Curcio, I.D.D., Lappalainen, V., and Mostafa, M.-E., QoS evaluation of 3G-324M mobile videophones over WCDMA networks, Comput. Networks, 37 (3–4), 425–445, 2001.

[33]ITU-T, Terminal for low bit-rate multimedia communication, Recommendation H.324, March 2002.

[34]3GPP TSGS-SA, Packet switched conversational multimedia applications, Default codecs (Release 5), TS 26.235, v.5.1.0 (2002–03).

[35]3GPP TSGS-SA, Mandatory speech codec speech processing functions, AMR speech codec, General description (Release 5), TS 26.071, v.5.0.0 (2002–06).

[36]Sjoberg, J. et al., RTP payload format and file storage format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) audio codecs, IETF RFC 3267, March 2002.

[37]ITU-T, Wideband coding of speech at around 16 kbits/s using Adaptive Multi-Rate Wideband (AMR-WB), Recommendation G.722.2, January 2002.

[38]Sjoberg, J. et al., RTP payload format and file storage format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) audio codecs, IETF RFC 3267, March 2002.

[39]ITU-T, Video coding for low bit rate communication, Recommendation H.263, February 1998.

[40]Bormann, C. et al., RTP Payload format for the 1998 version of ITU-T Recommendation H.263 (H.263+), IETF RFC 2429, October 1998.

[41]ITU-T, Video coding for low bit rate communication, Profiles and levels definition, Recommendation H.263 Annex X, April 2001.

[42]Bormann, C. et al., RTP Payload format for the 1998 version of ITU-T Recommendation H.263 (H.263+), IETF RFC 2429, October 1998.

[43]ISO/IEC, Information technology - Coding of audio-visual objects - Part 2: Visual, 14496-2, 2001.

[44]Kikuchi, Y. et al., RTP payload format for MPEG-4 Audio/Visual streams, IETF RFC 3016, November 2000.

[45]ITU-T, Protocol for multimedia application text conversation, Recommendation T.140, February 1998.

[46]Hellstrom, G., RTP Payload for Text Conversation, IETF RFC 2793, May 2000.

[47]Schulzrinne, H. et al., RTP: A Transport Protocol for Real-Time Applications, IETF RFC 1889, January 1996.

[48]Rosenberg, J. et al., SIP: Session Initiation Protocol, IETF RFC 3261, March 2002.

[49]3GPP TSG CN, Signaling flows for the IP multimedia call control based on SIP and SDP, Stage 3 (Release 5), TS 24.228 v.5.3.0 (2002-12).

[50]Handley, M. and Jacobson, V., SDP: Session description protocol, IETF RFC 2327, April 1998.

[51]3GPP TSG CN, Signaling flows for the IP multimedia call control based on SIP and SDP, Stage 3 (Release 5), TS 24.228 v.5.3.0 (2002-12).

[52]3GPP TSG-SSA, IP Multimedia Subsystem (IMS), Stage 2 (Release 5), TS 23.228 v.5.7.0 (2002-12).

[53]3GPP TSG CN, IP multimedia call control protocol based on SIP and SDP, Stage 3 (Release 5), TS 24.229 v.5.3.0 (2002-12).

[54]3GPP TSG CN, Signaling flows for the IP multimedia call control based on SIP and SDP, Stage 3 (Release 5), TS 24.228 v.5.3.0 (2002-12).

[55]Rosenberg, J. and Schulzrinne, H., Reliability of Provisional Responses in the Session Initiation Protocol (SIP), IETF RFC 3262, June 2002.




Wireless Internet Handbook. Technologies, Standards and Applications
Wireless Internet Handbook: Technologies, Standards, and Applications (Internet and Communications)
ISBN: 0849315026
EAN: 2147483647
Year: 2003
Pages: 239

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net