Certification Objective Parsing,Tokenizing, and Formatting (Exam Objective 3.5)


Certification Objective —Parsing,Tokenizing, and Formatting (Exam Objective 3.5)

3.5 Write code that uses standard J2SE APIs in the java.util and java.util.regex packages to format or parse strings or streams. For strings, write code that uses the Pattern and Matcher classes and the String. split method. Recognize and use regular expression patterns for matching (limited to: .(dot), *(star), +(plus), ?, \d, \s, \w, [],() ). The use of *, + , and ? will be limited to greedy quantifiers, and the parenthesis operator will only be used as a grouping mechanism, not for capturing content during matching. For streams, write code using the Formatter and Scanner classes and the PrintWriter. format/printf methods. Recognize and use formatting parameters (limited to: %b, %c, %d, %f, %s) in format Strings.

We're going to start with yet another disclaimer: This small section isn't going to morph you from regex newbie to regex guru. In this section we'll cover three basic ideas:

  • Finding stuff You've got big heaps of text to look through. Maybe you're doing some screen scraping, maybe you're reading from a file. In any case, you need easy ways to find textual needles in textual haystacks. We'll use the java.regex.Pattern, java.regex.Matcher, and java.util.Scanner classes to help us find stuff.

  • Tokenizing stuff You've got a delimited file that you want to get useful data out of. You want to transform a piece of a text file that looks like: " 1500.00,343.77,123.4" into some individual float variables. We'll show you the basics of using the String.split() method and the java.util.Scanner class, to tokenize your data.

  • Formatting stuff You've got a report to create and you need to take a float variable with a value of 32500.000f and transform it into a String with a value of "$32,500.00". We'll introduce you to the java.util.Formatter class and to the printf() and format() methods.

A Search Tutorial

Whether you're looking for stuff or tokenizing stuff, a lot of the concepts are the same, so let's start with some basics. No matter what language you're using, sooner or later you'll probably be faced with the need to search through large amounts of textual data, looking for some specific stuff.

Regular expressions (regex for short) are a kind of language within a language, designed to help programmers with these searching tasks. Every language that provides regex capabilities uses one or more regex engines. regex engines search through textual data using instructions that are coded into expressions. A regex expression is like a very short program or script. When you invoke a regex engine, you'll pass it the chunk of textual data you want it to process (in Java this is usually a String or a stream), and you pass it the expression you want it to use to search through the data.

It's fair to think of regex as a language, and we will refer to it that way throughout this section. The regex language is used to create expressions, and as we work through this section, whenever we talk about expressions or expression syntax, we're talking about syntax for the regex "language." Oh, one more disclaimerwe know that you regex mavens out there can come up with better expressions than what we're about to present. Keep in mind that for the most part we're creating these expressions using only a portion of the total regex instruction set, thanks.

Simple Searches

For our first example, we'd like to search through the following source String

 abaaaba 

for all occurrences (or matches) of the expression

 ab 

In all of these discussions we'll assume that our data sources use zero-based indexes, so if we apply an index to our source string we get

 source: abaaaba index:  0123456 

We can see that we have two occurrences of the expression ab: one starting at position 0 and the second starting at position 4. If we sent the previous source data and expression to a regex engine, it would reply by telling us that it found matches at positions 0 and 4:

 import java.util.regex.*; class RegexSmall {   public static void main(String[] args) {     Pattern p = Pattern.compile("ab");      // the expression     Matcher m = p.matcher("abaaaba");       // the source     boolean b = false;     while(b = m.find()) {       System.out.print(m.start() + " ");     }   } } 

This produces

 0 4 

We're not going to explain this code right now. In a few pages we're going to show you a lot more regex code, but first we want to go over some more regex syntax. Once you understand a little more regex, the code samples will make a lot more sense. Here's a more complicated example of a source and an expression:

 source: abababa index:  0123456 expression: aba 

How many occurrences do we get in this case? Well, there is clearly an occurrence starting at position 0, and another starting at position 4. But how about starting at position 2? In general in the world of regex, the aba string that starts at position 2 will not be considered a valid occurrence. The first general regex search rule is

  • In general, a regex search runs from left to right, and once a source's character has been used in a match, it cannot be reused.

So in our previous example, the first match used positions 0, 1, and 2 to match the expression. (Another common term for this is that the first three characters of the source were consumed.) Because the character in position 2 was consumed in the first match, it couldn't be used again. So the engine moved on, and didn't find another occurrence of aba until it reached position 4. This is the typical way that a regex matching engine works. However, in a few pages, we'll look at an exception to the first rule we stated above.

So we've matched a couple of exact strings, but what would we do if we wanted to find something a little more dynamic? For instance, what if we wanted to find all of the occurrences of hex numbers or phone numbers or ZIP codes?

Searches Using Metacharacters

As luck would have it, regex has a powerful mechanism for dealing with the cases we described above. At the heart of this mechanism is the idea of a metacharacter. As an easy example, let's say that we want to search through some source data looking for all occurrences of numeric digits. In regex, the following expression is used to look for numeric digits:

 \d 

If we change the previous program to apply the expression \d to the following source string

 source: a12c3e456f index:  0123456789 

regex will tell us that it found digits at positions 1, 2, 4, 6, 7, and 8. (If you want to try this at home, you'll need to "escape" the compile method's "\d" argument by making it "\\d", more on this a little later.)

Regex provides a rich set of metacharacters that you can find described in the API documentation for java.util.regex.Pattern. We won't discuss them all here, but we will describe the ones you'll need for the exam:

\d

A digit

\s

A whitespace character

\w

A word character (letters, digits, or "_" (underscore))

So for example, given

 source: "a 1 56 _Z" index:   012345678 pattern: \w 

regex will return positions 0, 2, 4, 5, 7, and 8. The only characters in this source that don't match the definition of a word character are the whitespaces. (Note: In this example we enclosed the source data in quotes to clearly indicate that there was no whitespace at either end.)

You can also specify sets of characters to search for using square brackets and ranges of characters to search for using square brackets and a dash:

[abc]

Searches only for a's, b's or c's

[a–f]

Searches only for a, b, c, d, e, or f characters

In addition, you can search across several ranges at once. The following expression is looking for occurrences of the letters a - f or A - F, it's NOT looking for an fA combination:

[a-fA-F]

Searches for the first six letters of the alphabet, both cases.

So for instance,

 source: "cafeBABE" index:   01234567 pattern: [a-cA-C] 

returns positions 0, 1,4, 5, 6.

On the Job 

In addition to the capabilities described for the exam, you can also apply the following attributes to sets and ranges within square brackets: "^" to negate the characters specified, nested brackets to create a union of sets, and "&&" to specify the intersection of sets. While these constructs are not on the exam, they are quite useful, and good examples can be found in the API for the java.util.regex.Pattern class.

Searches Using Quantifiers

Let's say that we want to create a regex pattern to search for hexadecimal literals. As a first step, let's solve the problem for one-digit hexadecimal numbers:

 0 [xX] [0-9a--fA-F] 

The preceding expression could be stated: "Find a set of characters in which the first character is a "0", the second character is either an "x" or an "x", and the third character is either a digit from "0" to "9", a letter from "a" to "f" or an uppercase letter from "A" to "F"". Using the preceding expression, and the following data,

 source:  "12 0x 0x12 0Xf 0xg" index:    012345678901234567 

regex would return 6 and 11. (Note: 0x and 0xg are not valid hex numbers.)

As a second step, let's think about an easier problem. What if we just wanted regex to find occurrences of integers? Integers can be one or more digits long, so it would be great if we could say "one or more" in an expression. There is a set of regex constructs called quantifiers that let us specify concepts such as "one or more." In fact, the quantifier that represents "one or more" is the "+" character. We'll see the others shortly.

The other issue this raises is that when we're searching for something whose length is variable, getting only a starting position as a return value has limited value. So, in addition to returning starting positions, another bit of information that a regex engine can return is the entire match or group that it finds. We're going to change the way we talk about what regex returns by specifying each return on its own line, remembering that now for each return we're going to get back the starting position AND then the group:

 source: "1 a12 234b" pattern: \d+ 

You can read this expression as saying: "Find one or more digits in a row." This expression produces the regex output

 0 1 3 12 6 234 

You can read this as "At position 0 there's an integer with a value of 1, then at position 3 there's an integer with a value of 12, then at position 6 there's an integer with a value of 234." Returning now to our hexadecimal problem, the last thing we need to know is how to specify the use of a quantifier for only part of an expression. In this case we must have exactly one occurrence of ox or ox but we can have from one to many occurrences of the hex "digits" that follow. The following expression adds parentheses to limit the "+" quantifier to only the hex digits:

 0[xX] ([0–9a-fA-F]) + 

The parentheses and "+" augment the previous find-the-hex expression by saying in effect: "Once we've found our Ox or OX, you can find from one to many occurrences of hex digits." Notice that we put the "+" quantifier at the end of the expression. It's useful to think of quantifiers as always quantifying the part of the expression that precedes them.

The other two quantifiers we're going to look at are

*

Zero or more occurrences

?

Zero or one occurrence

Let's say you have a text file containing a comma-delimited list of all the file names in a directory that contains several very important projects. (BTW, this isn't how we'd arrange our directories: ) You want to create a list of all the files whose names start with proj1. You might discover .txt files, .Java files, .pdf files, who knows? What kind of regex expression could we create to find these various proj1 files? First let's take a look at what a part of this text might look like:

 ..."proj3.txt,proj1sched.pdf,proj1,proj2,proj1.java"... 

To solve this problem we're going to use the regex ^ (carat) operator, which we mentioned earlier. The regex ^ operator isn't on the exam, but it will help us create a fairly clean solution to our problem. The ^ is the negation symbol in regex. For instance, if you want to find anything but a's, b's, or c's in a file you could say

 [^abc] 

So, armed with the ^ operator and the * (zero or more) quantifier we can create the following:

 proj1([^,])* 

If we apply this expression to just the portion of the text file we listed above, regex returns

 10 proj1sched.pdf 25 proj1 37 proj1.java 

The key part of this expression is the "give me zero or more characters that aren't a comma."

The last quantifier example we'll look at is the ? (zero or one) quantifier. Let's say that our job this time is to search a text file and find anything that might be a local, 7-digit phone number. We're going to say, arbitrarily, that if we find either seven digits in a row, or three digits followed by a dash or a space followed by 4 digits, that we have a candidate. Here are examples of "valid" phone numbers:

 1234567 123 4567 123–4567 

The key to creating this expression is to see that we need "zero or one instance of either a space or a dash" in the middle of our digits:

 \d\d\d([-\s])?\d\d\d\d 

The Predefined Dot

In addition to the \s, \d, and \w metacharacters that we discussed, you also have to understand the "." (dot) metacharacter. When you see this character in a regex expression, it means "any character can serve here." For instance, the following source and pattern

 source: "ac abc a c" pattern: a.c 

will produce the output

 3 abc 7 a c 

The "." was able to match both the "b" and the " " in the source data.

Greedy Quantifiers

When you use the *, +, and ? quantifiers, you can fine tune them a bit to produce behavior that's known as "greedy," "reluctant," or "possessive." Although you need to understand only the greedy quantifier for the exam, we're also going to discuss the reluctant quantifier to serve as a basis for comparison. First the syntax:

  • ? is greedy, ?? is reluctant, for zero or once

  • * is greedy, *? is reluctant, for zero or more

  • + is greedy, +? is reluctant, for one or more

What happens when we have the following source and pattern?

 source:  yyxxxyxx pattern:  .*xx 

First off, we're doing something a bit different here by looking for characters that prefix the static (xx) portion of the expression. We think we're saying something like: "Find sets of characters that ends with xx". Before we tell what happens, we at least want you to consider that there are two plausible resultscan you find them? Remember we said earlier that in general, regex engines worked from left to right, and consumed characters as they went. So, working from left to right, we might predict that the engine would search the first 4 characters (0–3), find xx starting in position 2, and have its first match. Then it would proceed and find the second xx starting in position 6. This would lead us to a result like this:

 0 yyxx 4 xyxx 

A plausible second argument is that since we asked for a set of characters that ends with xx we might get a result like this:

 0 yyxxxyxx 

The way to think about this is to consider the name greedy. In order for the second answer to be correct, the regex engine would have to look (greedily) at the entire source data before it could determine that there was an xx at the end. So in fact, the second result is the correct result because in the original example we used the greedy quantifier *. The result that finds two different sets can be generated by using the reluctant quantifier *?. Let's review:

 source:  yyxxxyxx pattern:  .*xx 

is using the greedy quantifier * and produces

 0 yyxxxyxx 

If we change the pattern to

 source:  yyxxxyxx pattern:  .*?xx 

we're now using the reluctant qualifier *?, and we get the following:

 0 yyxx 4 xyxx 

The greedy quantifier does in fact read the entire source data, and then it works backwards (from the right) until it finds the rightmost match. At that point, it includes everything from earlier in the source data up to and including the data that is part of the rightmost match.

On the Job 

There are a lot more aspects to regex quantifiers than we've discussed here, but we've covered more than enough for the exam. Sun has several tutorials that will help you learn more about quantfiers, and turn you into the go-to person at your job.

When Metacharacters and Strings Collide

So far we've been talking about regex from a theoretical perspective. Before we can put regex to work we have to discuss one more gotcha. When it's time to implement regex in our code, it will be quite common that our source data and/or our expressions will be stored in Strings. The problem is that metacharacters and Strings don't mix too well. For instance. let's say we just want to do a simple regex pattern that looks for digits. We might try something like

 String pattern = "\d";    // compiler error! 

This line of code won't compile! The compiler sees the \ and thinks, "Ok, here comes an escape sequence, maybe it'll be a new line!" But no, next comes the d and the compiler says "I've never heard of the \d escape sequence." The way to satisfy the compiler is to add another backslash in front of the \d

 String pattern = "\\d";    // a compilable metacharacter 

The first backslash tells the compiler that whatever comes next should be taken literally, not as an escape sequence. How about the dot (.) metacharacter? If we want a dot in our expression to be used as a metacharacter, then no problem, but what if we're reading some source data that happens to use dots as delimiters? Here's another way to look at our options:

 String p = ".";   // regex sees this as the "." metacharacter String p = "\.";  // the compiler sees this as an illegal                   // Java escape sequence String p = "\\."; // the compiler is happy, and regex sees a                   // dot, not a metacharacter 

A similar problem can occur when you hand metacharacters to a Java program via command-line arguments. If we want to pass the \d metacharacter into our Java program, our JVM does the right thing if we say

 % java DoRegex "\d" 

But your JVM might not. If you have problems running the following examples, you might try adding a backslash (i.e. \\d) to your command-line metacharacters. Don't worry, you won't see any command-line metacharacters on the exam!

on the Job 

The Java language defines several escape sequences, including

  • \n = linefeed (which you might see on the exam)

  • \b = backspace

  • \t = tab

And others, which you can find in the Java Language Specification. Other than perhaps seeing a \n inside a String, you won't have to worry about Java's escape sequences on the exam.

At this point we've learned enough of the regex language to start using it in our Java programs. We'll start by looking at using regex expressions to find stuff, and then we'll move to the closely related topic of tokenizing stuff.

Locating Data via Pattern Matching

Once you know a little regex, using the java.util.regex.Pattern (Pattern) and java.util.regex.Matcher (Matcher) classes is pretty straightforward. The Pattern class is used to hold a representation of a regex expression, so that it can be used and reused by instances of the Matcher class. The Matcher class is used to invoke the regex engine with the intention of performing match operations. The following program shows Pattern and Matcher in action, and it's not a bad way for you to do your own regex experiments:

 import java.util.regex.*; class Regex {   public static void main(String [] args) {     Pattern p = Pattern.compile(args[0]);     Matcher m = p.matcher(args[1] );     boolean b = false;     System.out.println("Pattern is " + m.pattern());     while(b = m.find()) {       System.out.println(m.start() + " " + m.group());     }   } } 

This program uses the first command-line argument (args[0]) to represent the regex expression you want to use, and it uses the second argument (args[l]) to represent the source data you want to search. Here's a test run:

 % java Regex "\d\w" "ab4 56_7ab" 

Produces the output

 Pattern is \d\w 4 56 7 7a 

(Remember, if you want this expression to be represented in a String, you'd use \\d\\w). Because you'll often have special characters or whitespace as part of your arguments, you'll probably want to get in the habit of always enclosing your argument in quotes. Let's take a look at this code in more detail. First off, notice that we aren't using new to create a Pattern; if you check the API, you'll find no constructors are listed. You'll use the overloaded, static compile() method (that takes string expression) to create an instance of Pattern. For the exam, all you'll need to know to create a Matcher, is to use the Pattern.matcher() method (that takes String sourceData).

The important method in this program is the find() method. This is the method that actually cranks up the regex engine and does some searching. The find() method returns true if it gets a match, and remembers the start position of the match. If find() returns true, you can call the start() method to get the starting position of the match, and you can call the group() method to get the string that represents the actual bit of source data that was matched.

On the Job 

A common reason to use regex is to perform search and replace operations. Although replace operations are not on the exam you should know that the Matcher class provides several methods that perform search and replace operations. See the appendReplacement(), appendTail(), and replaceAll() methods in the Matcher API for more details.

The Matcher class allows you to look at subsets of your source data by using a concept called regions. In real life, regions can greatly improve performance, but you won't need to know anything about them for the exam.

Searching Using the Scanner Class Although the java.util.Scanner class is primarily intended for tokenizing data (which we'll cover next), it can also be used to find stuff, just like the Pattern and Matcher classes. While Scanner doesn't provide location information or search and replace functionality, you can use it to apply regex expressions to source data to tell you how many instances of an expression exist in a given piece of source data. The following program uses the first command-line argument as a regex expression, then asks for input using System.in. It outputs a message every time a match is found:

 import java.util.*; class ScanIn   public static void main(String[] args) {     System.out.print("input: ");     System.out.flush();     try {       Scanner s = new Scanner(System.in);       String token;       do {         token = s.findlnLine(args[0]);         System.out.println("found " + token);       } while (token != null);     } catch (Exception e) { System.out.println("scan exc"); }   } } 

The invocation and input

 java ScanIn "\d\d" input: 1b2c335f456 

produce the following:

 found 33 found 45 found null 

Tokenizing

Tokenizing is the process of taking big pieces of source data, breaking them into little pieces, and storing the little pieces in variables. Probably the most common tokenizing situation is reading a delimited file in order to get the contents of the file moved into useful places like objects, arrays or collections. We'll look at two classes in the API that provide tokenizing capabilities: String (using the split() method) and Scanner, which has many methods that are useful for tokenizing.

Tokens and Delimiters

When we talk about tokenizing, we're talking about data that starts out composed of two things: tokens and delimiters. Tokens are the actual pieces of data, and delimiters are the expressions that separate the tokens from each other. When most people think of delimiters, they think of single characters, like commas or backslashes or maybe a single whitespace. These are indeed very common delimiters, but strictly speaking, delimiters can be much more dynamic. In fact, as we hinted at a few sentences ago, delimiters can be anything that qualifies as a regex expression. Let's take a single piece of source data and tokenize it using a couple of different delimiters:

 source: "ab,cd5b,6x,z4" 

If we say that our delimiter is a comma, then our four tokens would be

 ab cd5b 6x z4 

If we use the same source, but declare our delimiter to be \d, we get three tokens:

 ab,cd b, X,Z 

In general, when we tokenize source data, the delimiters themselves are discarded, and all that we arc left with arc the tokens. So in the second example, we defined digits to be delimiters, so the 5, 6, and 4 do not appear in the final tokens.

Tokenizing with String.split()

The String class's split() method takes a regex expression as its argument, and returns a String array populated with the tokens produced by the split (or tokenizing) process. This is a handy way to tokenize relatively small pieces of data. The following program uses args[0] to hold a source string, and args[1] to hold the regex pattern to use as a delimiter:

 import java.util.*; class SplitTest {   public static void main(String[] args) {     String[] tokens = args[0] .split(args[1] ) ;     System.out.println("count " + tokens.length);     for(String s : tokens)       System.out.println(">" + s + "<");   } } 

Everything happens all at once when the split() method is invoked. The source string is split into pieces, and the pieces are all loaded into the tokens String array. All the code after that is just there to verify what the split operation generated. The following invocation

 % java SplitTest "ab5 ccc 45 @" "\d" 

produces

 count 4 >ab< > ccc < >< > @< 

(Note: Remember that to represent"\" in a string you may need to use the escape sequence " \ \". Because of this, and depending on your OS, your second argument might have to be " \ \d" or even " \ \ \ \d".)

We put the tokens inside "> <" characters to show whitespace. Notice that every digit was used as a delimiter, and that contiguous digits created an empty token.

One drawback to using the String.split() method is that often you'll want to look at tokens as they are produced, and possibly quit a tokenization operation early when you've created the tokens you need. For instance, you might be searching a large file for a phone number. If the phone number occurs early in the file, you'd like to quit the tokenization process as soon as you've got your number. The Scanner class provides a rich API for doing just such on-the-fly tokenization operations.

image from book
exam watch

Because system.cut.println() is so heavily used on the exam, you might see examples of escape sequences tucked in with questions on most any topic, including regex. Remember that if you need to create a String that contains a double quote " or a backslash \ you need to add an escape character first:

 System.out.println("\" \\"); 

This prints

 " \ 

So, what if you need to search for periods (.) in your source data? If you just put a period in the regex expression, you get the "any character" behavior. So, what if you try \. ? Now the Java compiler thinks you're trying to create an escape sequence that doesn't exist. The correct syntax is

 String s = "ab.cde.fg"; String[] tokens = s.split("\\."); 

image from book

Tokenizing with Scanner

The java.util.Scanner class is the Cadillac of tokenizing. When you need to do some serious tokenizing, look no further than Scanner—this beauty has it all. In addition to the basic tokenizing capabilities provided by string.split(), the Scanner class offers the following features:

  • Scanners can be constructed using files, streams, or Strings as a source.

  • Tokenizing is performed within a loop so that you can exit the process at any point.

  • Tokens can be converted to their appropriate primitive types automatically.

Let's look at a program that demonstrates several of Scanner's methods and capabilities. Scanner's default delimiter is whitespace, which this program uses. The program makes two Scanner objects: s1 is iterated over with the more generic next() method, which returns every token as a String, while s2 is analyzed with several of the specialized nextXxx() methods (where Xxx is a primitive type):

 import java.util.Scanner; class ScanNext {   public static void main(String [] args) {     boolean b2, b;     int i;     String s, hits = " ";     Scanner s1 = new Scanner(args[0]);     Scanner s2 = new Scanner(args[0]);     while(b = sl.hasNext()) {       s = s1.next();  hits += "s";     }     while(b = s2.hasNext()) {       if (s2.hasNextInt()) {         i = s2.nextlnt();  hits += "i";       } else if (s2.hasNextBoolean()) {         b2 = s2.nextBoolean(); hits += "b";       } else {         s2.next();  hits += "s2";       }     }     System.out.printl.n("hits " + hits);   } } 

If this program is invoked with

 java ScanNext "1 true 34 hi" 

it produces

 hits ssssibis2 

Of course we're not doing anything with the tokens once we've got them, but you can see that s2's tokens are converted to their respective primitives. A key point here is that the methods named hasNextXxx() test the value of the next token but do not actually get the token, nor do they move to the next token in the source data. The nextXxx() methods all perform two functions: they get the next token, and then they move to the next token.

The Scanner class has nextXxx() (for instance nextLong()) and hasNextXxx() (for instance hasNextDouble()) methods for every primitive type except char. In addition, the Scanner class has a useDelimiter() method that allows you to set the delimiter to be any valid regex expression.

Formatting with printf() and format()

What fun would accounts receivable reports be if the decimal points didn't line up? Where would you be if you couldn't put negative numbers inside of parentheses? Burning questions like these caused the exam creation team to include formatting as a part of the Java 5 exam. The format() and printf() methods were added to java.io.PrintStream in Java 5. These two methods behave exactly the same way, so anything we say about one of these methods applies to both of them. (The rumor is that Sun added printf() just to make old C programmers happy.)

Behind the scenes, the format() method uses the java.util.Formatter class to do the heavy formatting work. You can use the Formatter class directly if you choose, but for the exam all you have to know is the basic syntax of the arguments you pass to the format() method. The documentation for these formatting arguments can be found in the Formatter API. We're going to take the "nickel tour" of the formatting String syntax, which will be more than enough to allow you do to a lot of basic formatting work, AND ace all the formatting questions on the exam.

Let's start by paraphrasing the API documentation for format strings (for more complete, way-past-what-you-need-for-the-exam coverage, check out the java.util.Formatter API):

 printf("format string", argument(s)); 

The format string can contain both normal string literal information that isn't associated with any arguments, and argument-specific formatting data. The clue to determining whether you're looking at formatting data, is that formatting data will always start with a percent sign (%). Let's look at an example, and don't panic, we'll cover everything that comes after the % next:

 System.out.printf("%2$d + %1$d", 123, 456); 

This produces

 456 + 123 

Let's look at what just happened. Inside the double quotes there is a format string, then a +, and then a second format string. Notice that we mixed literals in with the format strings. Now let's dive in a little deeper and look at the construction of format strings:

 %[arg_index$] [flags] [width] [.precision]conversion char 

The values within [ ] are optional. In other words, the only required elements of a format string are the % and a conversion character. In the example above the only optional values we used were for argument indexing. The 2$ represents the second argument, and the 1$ represents the first argument. (Notice that there's no problem switching the order of arguments.) The d after the arguments is a conversion character (more or less the type of the argument). Here's a rundown of the format string elements you'll need to know for the exam:

  • arg_index An integer followed directly by a $, this indicates which argument should be printed in this position.

  • flags While many flags are available, for the exam you'll need to know:

    • "-" Left justify this argument

    • " + " Include a sign (+ or -) with this argument

    • "0" Pad this argument with zeroes

    • "," Use locale-specific grouping separators (i.e., the comma in 123,456)

    • "(" Enclose negative numbers in parentheses

  • width This value indicates the minimum number of characters to print. (If you want nice even columns, you'll use this value extensively.)

  • precision For the exam you'll only need this when formatting a floating-point number, and in the case of floating point numbers, precision indicates the number of digits to print after the decimal point.

  • conversion The type of argument you'll be formatting. You'll need to know:

    • b boolean

    • c char

    • d integer

    • f floating point

    • s string

Let's see some of these formatting strings in action:

 int i1 = -123; int i2 = 12345; System.out.printf(">%1$(7d< \n", i1); System.out.printf(">%0,7d< \n", i2); System.out.format(">%+-7d< \n", i2); System.out.printf(">%2$b + %1$5d< \n", i1, false); 

This produces:

 >  (123)< >012,345< >+12345 < >false +  -123< 

(We added the > and < literals to help show how minimum widths, and zero padding and alignments work.) Finally, it's important to remember that if you have a mismatch between the type specified in your conversion character and your argument, you'll get a runtime exception:

 System.out.format("%d", 12.3); 

This produces something like

 Exception in thread "main" java.util.IllegalFormatConversionEx- ception: d != java.lang.Double 




SCJP Sun Certified Programmer for Java 5 Study Guide Exam 310-055
SCJP Sun Certified Programmer for Java 5 Study Guide (Exam 310-055) (Certification Press)
ISBN: 0072253606
EAN: 2147483647
Year: 2006
Pages: 131

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net