Section E.1 Introduction

  • Before Unicode, software developers were plagued by the use of inconsistent character encoding (i.e., numeric values for characters). Most countries and organizations had their own encoding systems, which were incompatible. A good example is the individual encoding systems on the Windows and Macintosh platforms.
  • Computers process data by converting characters to numeric values. For instance, the character "a" is converted to a numeric value so that a computer can manipulate that piece of data.
  • Without Unicode, localization of global software requires significant modifications to the source code, which results in increased cost and delays in releasing the product.
  • Localization is necessary with each release of a version. By the time a software product is localized for a particular market, a newer version, which needs to be localized as well, is ready for distribution. As a result, it is cumbersome and costly to produce and distribute global software products in a market where there is no universal character-encoding standard.
  • The Unicode Consortium developed the Unicode Standard in response to the serious problems created by multiple character encodings and the use of those encodings.
  • The Unicode Standard facilitates the production and distribution of localized software. It outlines a specification for the consistent encoding of the world's characters and symbols.
  • Software products that handle text encoded in the Unicode Standard need to be localized, but the localization process is simpler and more efficient because the numeric values need not be converted.
  • The Unicode Standard is designed to be universal, efficient, uniform and unambiguous.
  • A universal encoding system encompasses all commonly used characters; an efficient encoding system parses text files easily; a uniform encoding system assigns fixed values to all characters; and an unambiguous encoding system represents the same character for any given value.

Section E.2 Unicode Transformation Formats

  • Unicode extends the limited ASCII character set to include all the major characters of the world.
  • Unicode makes use of three Unicode Transformation Formats (UTF): UTF-8, UTF-16 and UTF-32, each of which may be appropriate for use in different contexts.
  • UTF-8 data consists of 8-bit bytes (sequences of one, two, three or four bytes depending on the character being encoded) and is well suited for ASCII-based systems, where there is a predominance of one-byte characters (ASCII represents characters as one byte).
  • UTF-8 is a variable-width encoding form that is more compact for text involving mostly Latin characters and ASCII punctuation.
  • UTF-16 is the default encoding form of the Unicode Standard. It is a variable-width encoding form that uses 16-bit code units instead of bytes. Most characters are represented by a single unit, but some characters require surrogate pairs.
  • Surrogates are 16-bit integers in the range D800 through DFFF, which are used solely for the purpose of "escaping" into higher numbered characters.
  • Without surrogate pairs, the UTF-16 encoding form can only encompass 65,000 characters, but with the surrogate pairs, this is expanded to include over a million characters.
  • UTF-32 is a 32-bit encoding form. The major advantage of the fixed-width encoding form is that it uniformly expresses all characters, so that they are easy to handle in arrays and so forth.

Section E.3 Characters and Glyphs

  • The Unicode Standard consists of characters. A character is any written component that can be represented by a numeric value.
  • Characters are represented with glyphs (various shapes, fonts and sizes for displaying characters).
  • Code values are bit combinations that represent encoded characters. The Unicode notation for a code value is U+yyyy, in which U+ refers to the Unicode code values, as opposed to other hexadecimal values. The yyyy represents a four-digit hexadecimal number.
  • Currently, the Unicode Standard provides code values for 94,140 character representations.

Section E.4 Advantages/Disadvantages of Unicode

  • An advantage of the Unicode Standard is its impact on the overall performance of the international economy. Applications that conform to an encoding standard can be processed easily by computers anywhere.
  • Another advantage of the Unicode Standard is its portability. Applications written in Unicode can be easily transferred to different operating systems, databases, Web browsers and so on. Most companies currently support, or are planning to support, Unicode.

Section E.5 Using Unicode

  • To obtain more information about the Unicode Standard and the Unicode Consortium, visit It contains a link to the code charts, which contain the 16-bit code values for the currently encoded characters.
  • In the marking up of C# documents, the entity reference uyyyy is used, where yyyy represents the hexadecimal code value.



    Introduction to Computers, the Internet and Visual C#

    Introduction to the Visual C# 2005 Express Edition IDE

    Introduction to C# Applications

    Introduction to Classes and Objects

    Control Statements: Part 1

    Control Statements: Part 2

    Methods: A Deeper Look


    Classes and Objects: A Deeper Look

    Object-Oriented Programming: Inheritance

    Polymorphism, Interfaces & Operator Overloading

    Exception Handling

    Graphical User Interface Concepts: Part 1

    Graphical User Interface Concepts: Part 2


    Strings, Characters and Regular Expressions

    Graphics and Multimedia

    Files and Streams

    Extensible Markup Language (XML)

    Database, SQL and ADO.NET

    ASP.NET 2.0, Web Forms and Web Controls

    Web Services

    Networking: Streams-Based Sockets and Datagrams

    Searching and Sorting

    Data Structures



    Appendix A. Operator Precedence Chart

    Appendix B. Number Systems

    Appendix C. Using the Visual Studio 2005 Debugger

    Appendix D. ASCII Character Set

    Appendix E. Unicode®

    Appendix F. Introduction to XHTML: Part 1

    Appendix G. Introduction to XHTML: Part 2

    Appendix H. HTML/XHTML Special Characters

    Appendix I. HTML/XHTML Colors

    Appendix J. ATM Case Study Code

    Appendix K. UML 2: Additional Diagram Types

    Appendix L. Simple Types


    Visual C# How to Program
    Visual C# 2005 How to Program (2nd Edition)
    ISBN: 0131525239
    EAN: 2147483647
    Year: 2004
    Pages: 600 © 2008-2020.
    If you may any questions please contact us: