When you read a sentence, your mind breaks the sentence into tokensindividual words and punctuation marks, each of which conveys meaning to you. Compilers also perform tokenization. They break up statements into individual pieces like keywords, identifiers, operators and other elements of a programming language. In this section, we study Java's StringTokenizer class (from package java.util), which breaks a string into its component tokens. Tokens are separated from one another by delimiters, typically whitespace characters such as space, tab, newline and carriage return. Other characters can also be used as delimiters to separate tokens. The application in Fig. 29.18 demonstrates class StringTokenizer.
Figure 29.18. StringTokenizer object used to tokenize strings.
(This item is displayed on page 1378 in the print version)
1 // Fig. 29.18: TokenTest.java 2 // StringTokenizer class. 3 import java.util.Scanner; 4 import java.util.StringTokenizer; 5 6 public class TokenTest 7 { 8 // execute application 9 public static void main( String args[] ) 10 { 11 // get sentence 12 Scanner scanner = new Scanner( System.in ); 13 System.out.println( "Enter a sentence and press Enter" ); 14 String sentence = scanner.nextLine(); 15 16 // process user sentence 17 StringTokenizer tokens = new StringTokenizer( sentence ); 18 System.out.printf( "Number of elements: %d The tokens are: ", 19 tokens.countTokens() ); 20 21 while ( tokens.hasMoreTokens() ) 22 System.out.println( tokens.nextToken() ); 23 } // end main 24 } // end class TokenTest
|
When the user presses the Enter key, the input sentence is stored in String variable sentence. Line 17 creates an instance of class StringTokenizer using String sentence. This StringTokenizer constructor takes a string argument and creates a StringTokenizer for it, and will use the default delimiter string " f" consisting of a space, a tab, a carriage return and a newline for tokenization. There are two other constructors for class StringTokenizer. In the version that takes two String arguments, the second String is the delimiter string. In the version that takes three arguments, the second String is the delimiter string and the third argument (a boolean) determines whether the delimiters are also returned as tokens (only if the argument is TRue). This is useful if you need to know what the delimiters are.
Line 19 uses StringTokenizer method countTokens to determine the number of tokens in the string to be tokenized. The condition in the while statement at lines 2122 uses StringTokenizer method hasMoreTokens to determine whether there are more tokens in the string being tokenized. If so, line 22 prints the next token in the String. The next token is obtained with a call to StringTokenizer method nextToken, which returns a String. The token is output using println, so subsequent tokens appear on separate lines.
If you would like to change the delimiter string while tokenizing a string, you may do so by specifying a new delimiter string in a nextToken call as follows:
tokens.nextToken( newDelimiterString );
This feature is not demonstrated in Fig. 29.18.
Introduction to Computers, the Internet and the World Wide Web
Introduction to Java Applications
Introduction to Classes and Objects
Control Statements: Part I
Control Statements: Part 2
Methods: A Deeper Look
Arrays
Classes and Objects: A Deeper Look
Object-Oriented Programming: Inheritance
Object-Oriented Programming: Polymorphism
GUI Components: Part 1
Graphics and Java 2D™
Exception Handling
Files and Streams
Recursion
Searching and Sorting
Data Structures
Generics
Collections
Introduction to Java Applets
Multimedia: Applets and Applications
GUI Components: Part 2
Multithreading
Networking
Accessing Databases with JDBC
Servlets
JavaServer Pages (JSP)
Formatted Output
Strings, Characters and Regular Expressions
Appendix A. Operator Precedence Chart
Appendix B. ASCII Character Set
Appendix C. Keywords and Reserved Words
Appendix D. Primitive Types
Appendix E. (On CD) Number Systems
Appendix F. (On CD) Unicode®
Appendix G. Using the Java API Documentation
Appendix H. (On CD) Creating Documentation with javadoc
Appendix I. (On CD) Bit Manipulation
Appendix J. (On CD) ATM Case Study Code
Appendix K. (On CD) Labeled break and continue Statements
Appendix L. (On CD) UML 2: Additional Diagram Types
Appendix M. (On CD) Design Patterns
Appendix N. Using the Debugger
Inside Back Cover