Characters and Strings


Strings, or pieces of text, account for up to 50 percent and more of the objects created during the execution of a typical Java application. While Strings are objects, they are made up of sequences of individual characters. Java represents a character as a primitive type known as char. Since a char value is of a primitive type (like an int value), remember that you cannot send messages to it.

Characters

Java includes a char type that represents letters, digits, punctuation marks, diacriticals, and other special characters. Java bases its character set on a standard known as Unicode 4.0 for representation of its characters. The Unicode standard is designed to accommodate virtually all character variations in the major languages of the world. More information regarding the standard can be found at http://www.unicode.org.

Java uses two bytes to store each character. Two bytes is 16 bits, which means that Java can represent 216, or 65,536, characters. While that may seems like a lot, it's not enough to support everything in the Unicode standard. You probably won't need to concern yourself with supporting anything over the two-byte range, but if you do, Java allows you to work with characters as int values. An int is four bytes, so the billions of characters it can support should be sufficient until the Federation requires us to incorporate the Romulan alphabet.

You can represent character literals in Java in a few ways. The simplest form is to embed the actual character between single quotes (tics).

 char capitalA = 'A'; 

Language Tests

While I present most of the code in Agile Java as part of the ongoing student example, I show some Java syntactical details and variations as brief code snippets and assertions. The single-line assertions in this section on characters provide an example. I refer to these as language testsyou write them to learn the language. You might keep them to reinforce your understanding of the language for later use.

You can code these tests wherever you like. I typically code them as separate test methods in the current test class, then delete them once I have an understanding of the element I was working on.

You may choose to create a separate class to contain these "scratch" tests. Ultimately, you might even create a scratch package, suite, and/or project of such tests.

You might get some reuse out of storing these tests: Some of the language tests you build may end up being the basis for utility methods that encapsulate and simplify use of a language feature.


Characters are essentially numerics. Each character maps to a corresponding positive integer from 0 through 65,535. Here is a test snippet that shows how the character 'A' has a numeric value of 65 (its Unicode equivalent).

 assertEquals(65, capitalA); 

Not all characters can be directly entered via the keyboard. You can represent Unicode characters using the Unicode escape sequence, \u or \U, followed by a 4-digit hex number.

 assertEquals('\u0041', capitalA); 

Additionally, you may represent characters as a 3-digit octal (base 8) escape sequence.

 assertEquals('\101', capitalA); 

The highest possible character literal that you may represent as an octal sequence is '\377', which is equivalent to 255.

Most older languages (for example, C) treat characters as single bytes. The most well-known standard for representing characters in a single-byte character set (SBCS), the American Standard Code for Information Interchange (ASCII), is defined by ANSI X3.4.[1] The first 128 characters of Unicode map directly to their ASCII correspondents.

[1] In fact, ASCII is only a true standard for seven of the single byte's eight bits. Characters from 0 through 127 are consistently represented, but there are several competing standards for characters 128 through 255.

Special Characters

Java defines several special characters that you can use for things such as output formatting. Java represents the special characters with an escape sequence that consists of the backslash character (\) followed by a mnemonic. The table below summarizes the char literals that represent these special characters.

Carriage return

'\r'

Line feed

'\n'

Tab

'\t'

Form feed

'\f'

Backspace

'\b'


Since the tic character and the backslash character have special meanings with respect to char literals, you must represent them with an escape sequence. You may also escape (i.e., prefix with the escape character \) the double quote character, but you are not required to do so.

Single quote

'\"

Backslash

'\\'

Double quote

'\"'




Agile Java. Crafting Code with Test-Driven Development
Agile Javaв„ў: Crafting Code with Test-Driven Development
ISBN: 0131482394
EAN: 2147483647
Year: 2003
Pages: 391
Authors: Jeff Langr

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net