Section 10.5. Printf-Style Formatting

10.5. Printf-Style Formatting

One of the most anticipated new features in Java 5.0 is printf-style string formatting. printf-style formatting utilizes special format strings embedded into text to tell the formatting engine where to place arguments and to give detailed specification about conversions, layout, and alignment. The printf formatting methods also make use of variable-length argument lists, which makes working with them much easier. Here is a quick example of printf-formatted output:

     System.out.printf( "My name is %s and I am %d years old\n", name, age );

As we mentioned, printf formatting draws it name from the C language printf( ) function, so if you've done any C programming, this will look familiar. Java has extended the concept, adding some additional type safety and convenience features. Although Java has had some text formatting capabilities in the past (we'll discuss the java.text package and MessageFormat later), printf formatting was not really feasible until variable-length argument lists and autoboxing of primitive types were added in Java 5.0.

10.5.1. Formatter

The primary new tool in our text formatting arsenal is the java.util.Formatter class and its format( ) method. Several convenience methods can hide the Formatter object from you and you may not need to create a Formatter directly. First, the static String.format( ) method can be used to format a String with arguments (like the C language sprintf( ) method):

     String message =         String.format("My name is %s and I am %d years old.", name, age );

Next, the java.io.PrintStream and java.io.PrintWriter classes, which are used for writing text to streams, have their own format( ) method. We discuss streams in Chapter 12, but this simply means that you can use this same printf-style formatting for writing strings to any kind of stream, whether it be to System.out standard console output, to a file, or to a network connection.

In addition to the format( ) method, PrintStream and PrintWriter also have a version of the format method that is actually called printf( ). The printf( ) method is identical to and, in fact, simply delegates to the format( ) method. It's there solely as a shout out to the C programmers and ex-C programmers in the audience.

10.5.2. The Format String

The syntax of the format string is compact and a bit cryptic at first, but not bad once you get used to it. The simplest format string is just a percent sign (%) followed by a conversion character. For example, the following text has two embedded format strings:

     "My name is %s and I am %d years old."

The first conversion character is s, the most general format, which represents a string value, and the second is d, which represents an integer value. There are about a dozen basic conversion characters corresponding to different types and primitives and there are a couple of dozen more specifically used for formatting dates and times. We cover the basics here and return to date and time formatting in Chapter 11.

At first glance, some of the conversion characters may not seem to do much. For example, the %s general string conversion in our example above would actually have handled the job of displaying the numeric age argument just as well as %d. However, these specialized conversion characters accomplish three things. First, they add a level of type safety. By specifying %d, we ensure that only an integer type is formatted at that location. If we make a mistake in the arguments, we get a runtime IllegalFormatConversionException instead of garbage in our string. Second, the format method is Locale-sensitive and capable of displaying numbers, percentages, dates, and times in many different languages, just by specifying a Locale as an argument. By telling the Formatter the type of argument with type-specific conversion characters, printf can take into account language-specific localizations. Third, additional flags and fields can be used to govern layout with different meanings for different types of arguments. For example, with floating-point numbers, you can specify a precision in the format string.

The general layout of the embedded format string is as follows:

     %[argument_index$][flags][width][.precision]conversion_type

Following the literal % are a number of optional items before the conversion type character. We'll discuss these as they come up, but here's the rundown. The argument index can be used to reorder or reuse individual arguments in the variable-length argument list by referring to them by number. The flags field holds one or more special flag characters governing the format. The width and precision fields control the size of the output for text and the number of digits displayed for floating-point numbers.

10.5.3. String Conversions

The conversion characters s represents the general string conversion type. Ultimately, all of the conversion types produce a String. What we mean is that the general string conversion takes the easy route to turning its argument into a string. Normally, this simply means calling toString( ) on the object. Since all of the arguments in the variable argument list are autoboxed, they are all Objects. Any primitives are represented by the results of calling toString( ) on their wrapper classes, which generally return the value as you'd expect. If the argument is null, the result is the String "null."

More interesting are objects that implement the java.util.Formattable interface. For these, the argument's formatTo( ) method is invoked, passing it the flags, width, and precision information and allowing it to return the string to be used. In this way, objects can control their own printf string representation, just as an object can with toString( ).

10.5.3.1 Width, precision, and justification

For simple text arguments, you can think of the width and precision as a minimum and maximum number of characters to be output. As we'll see later, for floating-point numeric types, the precision changes meaning slightly and controls the number of digits displayed after the decimal point. We can see the effect on a simple string here:

     System.out.printf("String is '%5s'\n", "A");     // String is '    A'     System.out.printf("String is '%.5s'\n", "Happy Birthday!");     // String is 'Happy'

In the first case, we specified a width of five characters, resulting in spaces being added to pad our argument. In the second example, we used the literal . followed by the precision value of 5 characters to limit the length of the string displayed, so our "Happy Birthday" string is truncated after the first five characters.

When our string was padded, it was right-justified (leading spaces added). You can control this with the flag character literal minus (-). Reversing our example:

     System.out.printf("String is '%-5s'\n", "A");     // String is 'A    '

And, of course, we can combine all three, specifying a justification flag and a minimum and maximum width. Here is an example that prints words of varying lengths in two columns:

     String [] words =        new String [] { "abalone", "ape", "antidisestablishmentarianism" };     System.out.printf( "%-10s %s\n", "Word", "Length" );     for ( String word : words )        System.out.printf( "%-10.10s %s\n", word, word.length(  ) );     // output     Word       Length     abalone    7     ape        3     antidisest 28

10.5.3.2 Uppercase

The s conversion's big brother S indicates that the output of the conversion should be forced to uppercase. Several other primitive and numeric conversion characters follow this pattern as we'll see later. For example:

     String word = "abalone";     System.out.println(" The lucky word is: %S\n", word );     // The lucky word is: ABALONE

10.5.3.3 Numbered arguments

You can refer to an arbitrary argument by number from a format string using the %n$ notation. For example, the following uses the single argument three times:

     System.out.println( "A %1$s is a %1$s is a %1$S...", "rose" );      // A rose is a rose is a ROSE...

Numbered arguments are useful for two reasons. The first, shown here, is simply for reusing the same argument in different places and with different conversions. The usefulness of this becomes more apparent when we look at Date and Time formatting in Chapter 11, where we may refer to the same item half a dozen times to get individual fields. The second advantage is that numbered arguments give the message the flexibility to reorder the arguments. This is important when you're using formatting strings to lay out a message for internationalization or customization purposes.

     log.format("Error %d : %s\n", errNo, errMsg );     // Error 42 : Low Power     log.format("%2$s (Error %1$d)\n", errNo, errMsg );     // Low Power (Error 42)

10.5.4. Primitive and Numeric Conversions

Table 10-3 shows character and Boolean conversion characters.

Table 10-3. Character and Boolean conversion characters
Conversion	Type	Description	Example output
`c`	Character	Formats the result as a Unicode character	`a`
`b`, `B`	Boolean	Formats result as Boolean	`true`, `FALSE`

The c conversion character produces a Unicode character:

     System.out.printf("The first letter is: %c\n", 'a' );

The b and B conversion characters output the Boolean value of their arguments. If the argument is null, the output is false. Strangely, if the argument is of a type other than Boolean, the output is true. B is identical to b but forces the output to uppercase.

     System.out.printf( "The door is open: %b\n", ( door.status(  ) == OPEN ) );

As for String types, a width value can be specified on c and b conversions to pad the result to a minimum length.

Table 10-4 summarizes integer type conversion characters.

Table 10-4. Integer type conversion characters
Conversion	Type	Description	Example output
`d`	Integer	Formats the result as an integer.	`999`
`x`, `X`	Integer	Formats result as hexadecimal.	`FF`, `0xCAFE`
`o`	Integer	Formats result as octal integer.	`10`, `010`
`h`, `H`	Integer or object	Formats object as hexadecimal number. If object is not an integer, format its `hashCode( )` value or "null" for null value.	`7a71e498`

The d, x, and o conversion characters handle the integer type values byte, short, int, and long. (The d apparently stands for decimal, which makes little sense in this context.) The h conversion is an oddity probably intended for debugging. Several important flags give additional control over the formatting of these numeric types. See the section "Flags" for details.

A width value can be specified on these conversions to pad the result. Precision values are not allowed on integer conversions.

Table 10-5 lists floating-point type conversion characters.

Table 10-5. Floating-point type conversion characters
Conversion	Type	Description	Example output
`f`	Floating point	Formats result as decimal number	`3.14`
`e`, `E`	Floating point	Formats result in scientific notation	`3.000000e+08`
`g`, `G`	Floating point	Formats result in either decimal or scientific notation depending on value and precision.	`3.14`, `10.0e-15`
`a`, `A`	Floating point	Formats result as hexadecimal floating point number with significand and exponent.	`0x1.fep7`

The f conversion character is the primary floating-point conversion character. e and g conversions allow for values to be formatted in scientific notation. a complements the ability in Java 5.0 to assign floating-point values using hexadecimal significand and exponent notation, allowing bit-for-bit floating-point values to be displayed without ambiguity.

As always, a width value may be used to pad results to a minimum length. The precision value of the conversion, as its name suggests, controls the number of digits displayed after the decimal point for floating-point values. The value is rounded as necessary. If no precision value is specified, it defaults to six digits:

     printf("float is %f\n",   1.23456789); // float is 1.234568     printf("float is %.3f\n", 1.23456789); // float is 1.235     printf("float is %.1f\n", 1.23456789); // float is 1.2     printf("float is %.0f\n", 1.23456789); // float is 1

The g conversion character determines whether to use decimal or scientific notation. First, the value is rounded to the specified precision. If the result is less than 10^-4 (less than .0001) or if the result is greater than 10precision (10 to the power of the precision value), it is displayed in scientific notation. Otherwise, decimal notation is displayed.

10.5.5. Flags

Table 10-6 summarizes supported flags to use in format strings.

Table 10-6. Flags for format strings
Flag	Arg types	Description	Example output
`-`	Any	Left-justifies result (pad space on the right)	`'foo '`
`+`	Numeric	Prefixes a + sign on positive results	`+1`
`' '`	Numeric	Prefixes a space on positive results (aligning them with negative values)	`' 1'`
`0`	Numeric	Pads number with leading zeros to accommodate width requirement	`000001`
`,`	Numeric	Formats numbers with commas or other Locale-specific grouping characters	`1,234,567`
`(`	Numeric	Encloses negative numbers in parentheses (a convention used to show credits)	`(42.50)`
`#`	x,X,o	Uses an alternate form for octal and hexadecimal output	`0xCAFE`, `010`

As mentioned earlier, the - flag can be used to left-justify formatted output. The remaining flags affect the display of numeric types as described.

The # alternate form flag can be used to print octal and hexadecimal values with their standard prefixes: 0x for hexadecimal or 0 for octal:

     System.out.printf("%1$X, %1$#X", 0xCAFE, 0xCAFE ); // CAFE, 0xCAFE     System.out.printf("%1$o, %1$#o", 8, 8 ); // 10, 010

10.5.6. Miscellaneous

Table 10-7 lists the remaining formatting items.

Table 10-7. Miscellaneous formatting items
Conversion	Description
`%`	Produces a literal % character (Unicode `\u0025`)
`n`	Produces the platform-specific line separator (e.g., newline or carriage-return, newline)

Section 10.5. Printf-Style Formatting

10.5. Printf-Style Formatting

10.5.1. Formatter

10.5.2. The Format String

10.5.3. String Conversions

10.5.3.1 Width, precision, and justification

10.5.3.2 Uppercase

10.5.3.3 Numbered arguments

10.5.4. Primitive and Numeric Conversions

Table 10-3. Character and Boolean conversion characters

Table 10-4. Integer type conversion characters

Table 10-5. Floating-point type conversion characters

10.5.5. Flags

Table 10-6. Flags for format strings

10.5.6. Miscellaneous

Table 10-7. Miscellaneous formatting items