E.1. Introduction | Appendix E. Unicode®

Table of contents:

E 2 Unicode Transformation Formats

The use of inconsistent character encodings (i.e., numeric values associated with characters) in the developing of global software products causes serious problems, because computers process information as numbers. For instance, the character "a" is converted to a numeric value so that a computer can manipulate that piece of data. Many countries and corporations have developed their own encoding systems that are incompatible with the encoding systems of other countries and corporations. For example, the Microsoft Windows operating system assigns the value 0xC0 to the character "A with a grave accent"; the Apple Macintosh operating system assigns that same value to an upside-down question mark. This results in the misrepresentation and possible corruption of data when data is not processed as intended.

In the absence of a widely implemented universal character-encoding standard, global software developers had to localize their products extensively before distribution. Localization includes the language translation and cultural adaptation of content. The process of localization usually includes significant modifications to the source code (such as the conversion of numeric values and the underlying assumptions made by programmers), which results in increased costs and delays releasing the software. For example, some English-speaking programmers might design global software products assuming that a single character can be represented by one byte. However, when those products are localized for Asian markets, the programmer's assumptions are no longer valid; thus, the majority, if not the entirety, of the code needs to be rewritten. Localization is necessary with each release of a version. By the time a software product is localized for a particular market, a newer version, which needs to be localized as well, may be ready for distribution. As a result, it is cumbersome and costly to produce and distribute global software products in a market where there is no universal character-encoding standard.

In response to this situation, the Unicode Standard, an encoding standard that facilitates the production and distribution of software, was created. The Unicode Standard outlines a specification to produce consistent encoding of the world's characters and symbols. Software products that handle text encoded in the Unicode Standard need to be localized, but the localization process is simpler and more efficient because the numeric values need not be converted and the assumptions made by programmers about the character encoding are universal. The Unicode Standard is maintained by a nonprofit organization called the Unicode Consortium, whose members include Apple, IBM, Microsoft, Oracle, Sun Microsystems, Sybase and many others.

When the Consortium envisioned and developed the Unicode Standard, they wanted an encoding system that was universal, efficient, uniform and unambiguous. A universal encoding system encompasses all commonly used characters. An efficient encoding system allows text files to be parsed easily. A uniform encoding system assigns fixed values to all characters. An unambiguous encoding system represents a given character in a consistent manner. These four terms are referred to as the Unicode Standard design basis.

E 2 Unicode Transformation Formats

Preface

Index

Introduction to Computers, the Internet and Visual C#

Introduction to the Visual C# 2005 Express Edition IDE

Introduction to C# Applications

Introduction to Classes and Objects

Control Statements: Part 1

Control Statements: Part 2

Methods: A Deeper Look

Arrays

Classes and Objects: A Deeper Look

Object-Oriented Programming: Inheritance

Polymorphism, Interfaces & Operator Overloading

Exception Handling

Graphical User Interface Concepts: Part 1

Graphical User Interface Concepts: Part 2

Multithreading

Strings, Characters and Regular Expressions

Graphics and Multimedia

Files and Streams

Extensible Markup Language (XML)

Database, SQL and ADO.NET

ASP.NET 2.0, Web Forms and Web Controls

Web Services

Networking: Streams-Based Sockets and Datagrams

Searching and Sorting

Data Structures

Generics

Collections

Appendix A. Operator Precedence Chart

Appendix A. Operator Precedence Chart

Appendix B. Number Systems

Appendix C. Using the Visual Studio 2005 Debugger

Appendix D. ASCII Character Set

Appendix D. ASCII Character Set

Appendix E. Unicode®

Appendix F. Introduction to XHTML: Part 1

Appendix G. Introduction to XHTML: Part 2

Appendix H. HTML/XHTML Special Characters

Appendix H. HTML/XHTML Special Characters

Appendix I. HTML/XHTML Colors

Appendix I. HTML/XHTML Colors

Appendix J. ATM Case Study Code

Appendix K. UML 2: Additional Diagram Types

Appendix L. Simple Types

Index