Igor D.D. Curcio 
Video telephony is not a new technology. It was proposed two decades ago for home usage; however, it has not been as successful or accepted as anticipated for technical reasons, or because of incorrect marketing strategies (including pricing) or unfamiliarity of users with the technology.
Today, wide usage of Internet technology for searching, browsing, etc., has educated users toward an increased image and video data fruition and, in general, toward a multimedia scenario where audio, still images, video, text, and other data are presented together.
Mobile communications and devices are becoming more and more multimedia oriented. In addition, video and mobile network technologies are mature enough to be considered a single technology: mobile video technology.
During recent years different standardization organizations, such as the ITU-T (International Telecommunications Union, Telecommunications sector), IETF (Internet Engineering Task Force), and 3GPP (Third-Generation Partnership Project), have made enormous efforts to specify mobile multimedia network architectures, protocols, and codecs. Two main applications are enabled by those technology and research efforts: (1) mobile multimedia streaming, which has been described in Chapter 4 of this book; and (2) mobile video telephony. This chapter is about the state of the art in mobile video telephony.
A mobile video telephony application or a conversation multimedia application, as defined in 3GPP terminology, brings a new set of challenges:
The end-to-end delay requirements are very tight (compared to multimedia streaming).
Low-delay requirements restrict the range of techniques that can be used to provide good error resilience.
Mobile devices must have a consistent processing power to run speech and video encoders and decoders simultaneously, in order to process outgoing and incoming media flows.
In our framework, a mobile video telephony application includes (in addition to multiple bidirectional media, i.e., speech and video) the use cases where only one medium is used, i.e., the case where speech only is transmitted and received (Voice over IP), and the case where video only is transmitted and received (Video over IP).
This chapter is organized as follows: Section 21.2 describes the end-to-end system architecture for mobile video telephony systems. Section 21.3 briefly introduces the mobile networks for mobile video telephony. Section 21.4 introduces the current standards for mobile video telephony, based on ITU-T H.324 (for circuit-switched video telephony) and on IETF SIP (Session Initiation Protocol) (for packet-switched video telephony). Section 21.5 contains some performance and quality-of-service (QoS) considerations for implementations. Section 21.6 concludes this chapter.
The opinions expressed in this chapter are those of the author and not necessarily those of his employer.