Section 16.5. Multibyte Characters


16.5. Multibyte Characters

In multibyte character sets, each character is coded as a sequence of one or more bytes (see "Wide Characters and Multibyte Characters" in Chapter 1). Unlike wide characters, each of which is represented by a single object of the type wchar_t, individual multibyte characters may be represented by different numbers of bytes. However, the number of bytes that represent a multibyte character , including any necessary state-shift sequences, is never more than the value of the macro MB_CUR_MAX, which is defined in the header stdlib.h.

C provides standard functions to obtain the wide-character code, or wchar_t value, that corresponds to any given multibyte character, and to convert any wide character to its multibyte representation. Some multibyte encoding schemes are stateful; the interpretation of a given multibyte sequence may depend on its position with respect to control characters, called shift sequences, that are used in the multibyte stream or string. In such cases, the conversion of a multibyte character to a wide character, or the conversion of a multibyte string into a wide string, depends on the current shift state at the point where the first multibyte character is read. For the same reason, converting a wide character to a multibyte character, or a wide string to a multibyte string, may entail inserting appropriate shift sequences in the output.

Conversions between wide and multibyte characters or strings may be necessary when you read or write characters from a wide-oriented stream (see "Byte-Oriented and Wide-Oriented Streams" in Chapter 13).

Table 16-17 lists all of the standard library functions for handling multibyte characters.

Table 16-17. Multibyte character functions

Purpose

Functions in stdlib.h

Functions in wchar.h

Find the length of a multibyte character

mblen( )

mbrlen( )

Find the wide character corresponding to a given multibyte character

mbtowc( )

mbrtowc( )

Find the multibyte character corresponding to a given wide character

wctomb( )

wcrtomb( )

Convert a multibyte string into a wide string

mbstowcs( )

mbsrtowcs( )

Convert a wide string into a multibyte string

wcstombs( )

wcsrtombs( )

Convert between byte characters and wide characters

 

btowc( ), wctob( )

Test for the initial shift state

 

mbsinit( )


The letter r in the names of functions declared in wchar.h stands for "restartable." The restartable functions, in contrast to those declared in stdlib.h, without the r in their names, take an additional argument, which is a pointer to an object that stores the shift state of the multibyte character or string argument.



C(c) In a Nutshell
C in a Nutshell (In a Nutshell (OReilly))
ISBN: 0596006977
EAN: 2147483647
Year: 2006
Pages: 473

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net